class
MLP
[source]
MLP
(layer_sizes
:Union
[List
[T
],Tuple
],activations
:Optional
[Callable
]='tanh'
,out_act
:Optional
[bool
]=None
,out_squeeze
:Optional
[bool
]=False
) ::Module
A class for building a simple MLP network.
Args:
- layer_sizes (list or tuple): Layer sizes for the network.
- activations (Function): Activation function for MLP net.
- out_act (Function): Output activation function
- out_squeeze (bool): Whether to squeeze the output of the network.
class
CNN
[source]
CNN
(input_channels
:int
,input_height
:int
,output_size
:int
,kernel_size
:Optional
[int
]=3
,stride
:Optional
[int
]=1
,channels
:list
=[64, 64]
,linear_layer_sizes
:list
=[512]
,activation
:Callable
='relu'
,output_activation
:Callable
=None
,dropout_layers
:list
=None
,dropout_p
:float
=None
,out_squeeze
:bool
=False
) ::Module
Create a PyTorch CNN module.
Args:
- input_channels (int): number of channels in the input
- input_height (int): size of one side of input (currently assumes square input)
- output_size (int): size of network output
- kernel_size (int): Convolutional kernel size
- stride (int): convolutional kernel stride
- channels (list or tuple): List of channel sizes for each convolutional layer
- linear_layer_sizes (list or tuple): list of (if any) sizes of linear layers to add after convolutional layers
- activation (callable): activation function
- output_activation (int): if any, activation to apply to the output layer
- dropout_layers (list or tuple): if any, layers to apply dropout to
- dropout_p (float): probability of dropout to use
- out_squeeze (bool): whether to squeeze the output
Actor.forward
[source]
Actor.forward
(x
,a
=None
)
Forward pass for an policy.
Args:
- x (torch.Tensor): Input state from the environment.
- a (torch.Tensor): Action that was taken.
Returns:
- policy (PyTorch distribution): The policy distribution.
- logp_a (torch.Tensor): Log-probability of input action under the policy distribution.
class
CategoricalPolicy
[source]
CategoricalPolicy
(state_features
:int
,action_dim
:int
,hidden_sizes
:Union
[List
[T
],Tuple
],activation
:Callable
,out_activation
:Callable
) ::Actor
A class for a Categorical Policy network. Used in discrete action space environments.
The policy is an MLP
.
Args:
- state_features (int): Dimensionality of the state space.
- action_dim (int): Dimensionality of the action space.
- hidden_sizes (list or tuple): Hidden layer sizes.
- activation (Function): Activation function for the network.
- out_activation (Function): Output activation function for the network.
CategoricalPolicy.logprob_from_distribution
[source]
CategoricalPolicy.logprob_from_distribution
(policy
:Distribution
,actions
:Tensor
)
Calculate the log-probability of an action under a policy.
Args:
- policy (torch.distributions.Distribution): The policy distribution over input state.
- actions (torch.Tensor): Actions to take log probability of.
Returns:
- log_probs (torch.Tensor): Log-probabilities of actions under the policy distribution.
CategoricalPolicy.action_distribution
[source]
CategoricalPolicy.action_distribution
(x
:Tensor
)
Defines action distribution conditioned on input state.
Args:
- x(torch.Tensor): input state
Returns:
- Categorical distribution: Policy over the action space.
class
GaussianPolicy
[source]
GaussianPolicy
(state_features
:int
,action_dim
:int
,hidden_sizes
:Union
[List
[T
],Tuple
],activation
:Callable
,out_activation
:Callable
) ::Actor
A class for a Gaussian Policy network. Used in continuous action space environments. The policy is an MLP
.
Args:
- state_features (int): Dimensionality of the state space.
- action_dim (int): Dimensionality of the action space.
- hidden_sizes (list or tuple): Hidden layer sizes.
- activation (Function): Activation function for the network.
- out_activation (Function): Output activation function for the network.
GaussianPolicy.action_distribution
[source]
GaussianPolicy.action_distribution
(states
)
Defines action distribution conditioned on input state.
Args:
- x(torch.Tensor): input state
Returns:
- Normal distribution: Policy over the action space.
GaussianPolicy.logprob_from_distribution
[source]
GaussianPolicy.logprob_from_distribution
(policy
,actions
)
Calculate the log-probability of an action under a policy.
Args:
- policy (torch.distributions.Distribution): The policy distribution over input state.
- actions (torch.Tensor): Actions to take log probability of.
Returns:
- log_probs (torch.Tensor): Log-probabilities of actions under the policy distribution.
class
ActorCritic
[source]
ActorCritic
(state_features
:int
,action_space
:int
,hidden_sizes
:Union
[Tuple
,List
[T
],NoneType
]=(32, 32)
,activation
:Optional
[Callable
]='tanh'
,out_activation
:Optional
[Callable
]=None
,policy
:Optional
[Module
]=None
) ::Module
An Actor Critic class for Policy Gradient algorithms.
Has built-in capability to work with continuous (gym.spaces.Box) and discrete (gym.spaces.Discrete) action spaces.
The policy and value function are both MLP
.
If working with a different action space, the user can pass in a custom policy class for that action space as an argument.
Args:
- state_features (int): Dimensionality of the state space.
- action_space (gym.spaces.Space): Action space of the environment.
- hidden_sizes (list or tuple): Hidden layer sizes.
- activation (Function): Activation function for the network.
- out_activation (Function): Output activation function for the network.
- policy (nn.Module): Custom policy class for an environment where the action space is not gym.spaces.Box or gym.spaces.Discrete
ActorCritic.step
[source]
ActorCritic.step
(x
:Tensor
)
Get action, action log probability, and value estimate for an input state.
Args:
- x (torch.Tensor): input state.
Returns:
- action (torch.Tensor): Action chosen by the policy.
- logp_action (torch.Tensor): Log probability of that action chosen by the policy.
- value (torch.Tensor): Value estimate of the current state.
ActorCritic.act
[source]
ActorCritic.act
(x
:Tensor
)
Similar to step
, but get only the action.
Args:
- x (torch.Tensor): input state
Returns:
- action (torch.Tensor): Action chosen by the policy.
class
MLPQActor
[source]
MLPQActor
(state_features
:int
,action_dim
:int
,hidden_sizes
:Union
[list
,tuple
],activation
:Callable
,action_limit
:Union
[float
,int
]) ::Module
An actor for Q policy gradient algorithms.
The policy is an MLP
.
The output from the policy network is scaled to action space limits on the forward pass.
Args:
- state_features (int): Dimensionality of the state space.
- action_dim (int): Dimensionality of the action space.
- hidden_sizes (list or tuple): Hidden layer sizes.
- activation (Function): Activation function for the network.
- action_limit (float or int): Limits of the action space.
MLPQActor.forward
[source]
MLPQActor.forward
(x
:Tensor
)
Return output from the policy network scaled to the limits of the env action space. Args:
- x (torch.Tensor): States from environment.
Returns:
- scaled_action (torch.Tensor): Action scaled to action space limits.
class
MLPQFunction
[source]
MLPQFunction
(state_features
:int
,action_dim
:int
,hidden_sizes
:Union
[tuple
,list
],activation
:Callable
) ::Module
A Q function network for Q policy gradient methods.
The Q function is an MLP
. It always takes in a (state, action) pair and returns a Q-value estimate for that pair.
Args:
- state_features (int): Dimensionality of the state space.
- action_dim (int): Dimensionality of the action space.
- hidden_sizes (list or tuple): Hidden layer sizes.
- activation (Function): Activation function for the network.
MLPQFunction.forward
[source]
MLPQFunction.forward
(x
:Tensor
,a
:Tensor
)
Return Q-value estimate for state, action pair (x, a).
Args:
- x (torch.Tensor): Environment state.
- a (torch.Tensor): Action taken by the policy.
Returns:
- q (torch.Tensor): Q-value estimate for state action pair.