Implementations of RL policies, value functions, and actor-critic networks.

class MLP[source]

MLP(layer_sizes:Union[List[T], Tuple], activations:Optional[Callable]='tanh', out_act:Optional[bool]=None, out_squeeze:Optional[bool]=False) :: Module

A class for building a simple MLP network.

Args:

  • layer_sizes (list or tuple): Layer sizes for the network.
  • activations (Function): Activation function for MLP net.
  • out_act (Function): Output activation function
  • out_squeeze (bool): Whether to squeeze the output of the network.

class CNN[source]

CNN(input_channels:int, input_height:int, output_size:int, kernel_size:Optional[int]=3, stride:Optional[int]=1, channels:list=[64, 64], linear_layer_sizes:list=[512], activation:Callable='relu', output_activation:Callable=None, dropout_layers:list=None, dropout_p:float=None, out_squeeze:bool=False) :: Module

Create a PyTorch CNN module.

Args:

  • input_channels (int): number of channels in the input
  • input_height (int): size of one side of input (currently assumes square input)
  • output_size (int): size of network output
  • kernel_size (int): Convolutional kernel size
  • stride (int): convolutional kernel stride
  • channels (list or tuple): List of channel sizes for each convolutional layer
  • linear_layer_sizes (list or tuple): list of (if any) sizes of linear layers to add after convolutional layers
  • activation (callable): activation function
  • output_activation (int): if any, activation to apply to the output layer
  • dropout_layers (list or tuple): if any, layers to apply dropout to
  • dropout_p (float): probability of dropout to use
  • out_squeeze (bool): whether to squeeze the output

class Actor[source]

Actor() :: Module

Barebones class structure for an Actor.

Actor.forward[source]

Actor.forward(x, a=None)

Forward pass for an policy.

Args:

  • x (torch.Tensor): Input state from the environment.
  • a (torch.Tensor): Action that was taken.

Returns:

  • policy (PyTorch distribution): The policy distribution.
  • logp_a (torch.Tensor): Log-probability of input action under the policy distribution.

class CategoricalPolicy[source]

CategoricalPolicy(state_features:int, action_dim:int, hidden_sizes:Union[List[T], Tuple], activation:Callable, out_activation:Callable) :: Actor

A class for a Categorical Policy network. Used in discrete action space environments.

The policy is an MLP.

Args:

  • state_features (int): Dimensionality of the state space.
  • action_dim (int): Dimensionality of the action space.
  • hidden_sizes (list or tuple): Hidden layer sizes.
  • activation (Function): Activation function for the network.
  • out_activation (Function): Output activation function for the network.

CategoricalPolicy.logprob_from_distribution[source]

CategoricalPolicy.logprob_from_distribution(policy:Distribution, actions:Tensor)

Calculate the log-probability of an action under a policy.

Args:

  • policy (torch.distributions.Distribution): The policy distribution over input state.
  • actions (torch.Tensor): Actions to take log probability of.

Returns:

  • log_probs (torch.Tensor): Log-probabilities of actions under the policy distribution.

CategoricalPolicy.action_distribution[source]

CategoricalPolicy.action_distribution(x:Tensor)

Defines action distribution conditioned on input state.

Args:

  • x(torch.Tensor): input state

Returns:

  • Categorical distribution: Policy over the action space.

class GaussianPolicy[source]

GaussianPolicy(state_features:int, action_dim:int, hidden_sizes:Union[List[T], Tuple], activation:Callable, out_activation:Callable) :: Actor

A class for a Gaussian Policy network. Used in continuous action space environments. The policy is an MLP.

Args:

  • state_features (int): Dimensionality of the state space.
  • action_dim (int): Dimensionality of the action space.
  • hidden_sizes (list or tuple): Hidden layer sizes.
  • activation (Function): Activation function for the network.
  • out_activation (Function): Output activation function for the network.

GaussianPolicy.action_distribution[source]

GaussianPolicy.action_distribution(states)

Defines action distribution conditioned on input state.

Args:

  • x(torch.Tensor): input state

Returns:

  • Normal distribution: Policy over the action space.

GaussianPolicy.logprob_from_distribution[source]

GaussianPolicy.logprob_from_distribution(policy, actions)

Calculate the log-probability of an action under a policy.

Args:

  • policy (torch.distributions.Distribution): The policy distribution over input state.
  • actions (torch.Tensor): Actions to take log probability of.

Returns:

  • log_probs (torch.Tensor): Log-probabilities of actions under the policy distribution.

class ActorCritic[source]

ActorCritic(state_features:int, action_space:int, hidden_sizes:Union[Tuple, List[T], NoneType]=(32, 32), activation:Optional[Callable]='tanh', out_activation:Optional[Callable]=None, policy:Optional[Module]=None) :: Module

An Actor Critic class for Policy Gradient algorithms.

Has built-in capability to work with continuous (gym.spaces.Box) and discrete (gym.spaces.Discrete) action spaces. The policy and value function are both MLP.

If working with a different action space, the user can pass in a custom policy class for that action space as an argument.

Args:

  • state_features (int): Dimensionality of the state space.
  • action_space (gym.spaces.Space): Action space of the environment.
  • hidden_sizes (list or tuple): Hidden layer sizes.
  • activation (Function): Activation function for the network.
  • out_activation (Function): Output activation function for the network.
  • policy (nn.Module): Custom policy class for an environment where the action space is not gym.spaces.Box or gym.spaces.Discrete

ActorCritic.step[source]

ActorCritic.step(x:Tensor)

Get action, action log probability, and value estimate for an input state.

Args:

  • x (torch.Tensor): input state.

Returns:

  • action (torch.Tensor): Action chosen by the policy.
  • logp_action (torch.Tensor): Log probability of that action chosen by the policy.
  • value (torch.Tensor): Value estimate of the current state.

ActorCritic.act[source]

ActorCritic.act(x:Tensor)

Similar to step, but get only the action.

Args:

  • x (torch.Tensor): input state

Returns:

  • action (torch.Tensor): Action chosen by the policy.

class MLPQActor[source]

MLPQActor(state_features:int, action_dim:int, hidden_sizes:Union[list, tuple], activation:Callable, action_limit:Union[float, int]) :: Module

An actor for Q policy gradient algorithms.

The policy is an MLP. The output from the policy network is scaled to action space limits on the forward pass.

Args:

  • state_features (int): Dimensionality of the state space.
  • action_dim (int): Dimensionality of the action space.
  • hidden_sizes (list or tuple): Hidden layer sizes.
  • activation (Function): Activation function for the network.
  • action_limit (float or int): Limits of the action space.

MLPQActor.forward[source]

MLPQActor.forward(x:Tensor)

Return output from the policy network scaled to the limits of the env action space. Args:

  • x (torch.Tensor): States from environment.

Returns:

  • scaled_action (torch.Tensor): Action scaled to action space limits.

class MLPQFunction[source]

MLPQFunction(state_features:int, action_dim:int, hidden_sizes:Union[tuple, list], activation:Callable) :: Module

A Q function network for Q policy gradient methods.

The Q function is an MLP. It always takes in a (state, action) pair and returns a Q-value estimate for that pair.

Args:

  • state_features (int): Dimensionality of the state space.
  • action_dim (int): Dimensionality of the action space.
  • hidden_sizes (list or tuple): Hidden layer sizes.
  • activation (Function): Activation function for the network.

MLPQFunction.forward[source]

MLPQFunction.forward(x:Tensor, a:Tensor)

Return Q-value estimate for state, action pair (x, a).

Args:

  • x (torch.Tensor): Environment state.
  • a (torch.Tensor): Action taken by the policy.

Returns:

  • q (torch.Tensor): Q-value estimate for state action pair.