neuralnets

Implementations of RL policies, value functions, and actor-critic networks.

`class` `MLP`[source]

MLP(layer_sizes:Union[List[T], Tuple], activations:Optional[Callable]='tanh', out_act:Optional[bool]=None, out_squeeze:Optional[bool]=False) :: Module

A class for building a simple MLP network.

Args:

layer_sizes (list or tuple): Layer sizes for the network.
activations (Function): Activation function for MLP net.
out_act (Function): Output activation function
out_squeeze (bool): Whether to squeeze the output of the network.

`class` `CNN`[source]

CNN(input_channels:int, input_height:int, output_size:int, kernel_size:Optional[int]=3, stride:Optional[int]=1, channels:list=[64, 64], linear_layer_sizes:list=[512], activation:Callable='relu', output_activation:Callable=None, dropout_layers:list=None, dropout_p:float=None, out_squeeze:bool=False) :: Module

Create a PyTorch CNN module.

Args:

input_channels (int): number of channels in the input
input_height (int): size of one side of input (currently assumes square input)
output_size (int): size of network output
kernel_size (int): Convolutional kernel size
stride (int): convolutional kernel stride
channels (list or tuple): List of channel sizes for each convolutional layer
linear_layer_sizes (list or tuple): list of (if any) sizes of linear layers to add after convolutional layers
activation (callable): activation function
output_activation (int): if any, activation to apply to the output layer
dropout_layers (list or tuple): if any, layers to apply dropout to
dropout_p (float): probability of dropout to use
out_squeeze (bool): whether to squeeze the output

`class` `Actor`[source]

Actor() :: Module

Barebones class structure for an Actor.

`Actor.forward`[source]

Actor.forward(x, a=None)

Forward pass for an policy.

Args:

x (torch.Tensor): Input state from the environment.
a (torch.Tensor): Action that was taken.

Returns:

policy (PyTorch distribution): The policy distribution.
logp_a (torch.Tensor): Log-probability of input action under the policy distribution.

`class` `CategoricalPolicy`[source]

CategoricalPolicy(state_features:int, action_dim:int, hidden_sizes:Union[List[T], Tuple], activation:Callable, out_activation:Callable) :: Actor

A class for a Categorical Policy network. Used in discrete action space environments.

The policy is an MLP.

Args:

state_features (int): Dimensionality of the state space.
action_dim (int): Dimensionality of the action space.
hidden_sizes (list or tuple): Hidden layer sizes.
activation (Function): Activation function for the network.
out_activation (Function): Output activation function for the network.

`CategoricalPolicy.logprob_from_distribution`[source]

CategoricalPolicy.logprob_from_distribution(policy:Distribution, actions:Tensor)

Calculate the log-probability of an action under a policy.

Args:

policy (torch.distributions.Distribution): The policy distribution over input state.
actions (torch.Tensor): Actions to take log probability of.

Returns:

log_probs (torch.Tensor): Log-probabilities of actions under the policy distribution.

`CategoricalPolicy.action_distribution`[source]

CategoricalPolicy.action_distribution(x:Tensor)

Defines action distribution conditioned on input state.

Args:

x(torch.Tensor): input state

Returns:

Categorical distribution: Policy over the action space.

`class` `GaussianPolicy`[source]

GaussianPolicy(state_features:int, action_dim:int, hidden_sizes:Union[List[T], Tuple], activation:Callable, out_activation:Callable) :: Actor

A class for a Gaussian Policy network. Used in continuous action space environments. The policy is an MLP.

Args:

state_features (int): Dimensionality of the state space.
action_dim (int): Dimensionality of the action space.
hidden_sizes (list or tuple): Hidden layer sizes.
activation (Function): Activation function for the network.
out_activation (Function): Output activation function for the network.

`GaussianPolicy.action_distribution`[source]

GaussianPolicy.action_distribution(states)

Defines action distribution conditioned on input state.

Args:

x(torch.Tensor): input state

Returns:

Normal distribution: Policy over the action space.

`GaussianPolicy.logprob_from_distribution`[source]

GaussianPolicy.logprob_from_distribution(policy, actions)

Calculate the log-probability of an action under a policy.

Args:

policy (torch.distributions.Distribution): The policy distribution over input state.
actions (torch.Tensor): Actions to take log probability of.

Returns:

log_probs (torch.Tensor): Log-probabilities of actions under the policy distribution.

`class` `ActorCritic`[source]

ActorCritic(state_features:int, action_space:int, hidden_sizes:Union[Tuple, List[T], NoneType]=(32, 32), activation:Optional[Callable]='tanh', out_activation:Optional[Callable]=None, policy:Optional[Module]=None) :: Module

An Actor Critic class for Policy Gradient algorithms.

Has built-in capability to work with continuous (gym.spaces.Box) and discrete (gym.spaces.Discrete) action spaces. The policy and value function are both MLP.

If working with a different action space, the user can pass in a custom policy class for that action space as an argument.

Args:

state_features (int): Dimensionality of the state space.
action_space (gym.spaces.Space): Action space of the environment.
hidden_sizes (list or tuple): Hidden layer sizes.
activation (Function): Activation function for the network.
out_activation (Function): Output activation function for the network.
policy (nn.Module): Custom policy class for an environment where the action space is not gym.spaces.Box or gym.spaces.Discrete

`ActorCritic.step`[source]

ActorCritic.step(x:Tensor)

Get action, action log probability, and value estimate for an input state.

Args:

x (torch.Tensor): input state.

Returns:

action (torch.Tensor): Action chosen by the policy.
logp_action (torch.Tensor): Log probability of that action chosen by the policy.
value (torch.Tensor): Value estimate of the current state.

`ActorCritic.act`[source]

ActorCritic.act(x:Tensor)

Similar to step, but get only the action.

Args:

x (torch.Tensor): input state

Returns:

action (torch.Tensor): Action chosen by the policy.

`class` `MLPQActor`[source]

MLPQActor(state_features:int, action_dim:int, hidden_sizes:Union[list, tuple], activation:Callable, action_limit:Union[float, int]) :: Module

An actor for Q policy gradient algorithms.

The policy is an MLP. The output from the policy network is scaled to action space limits on the forward pass.

Args:

state_features (int): Dimensionality of the state space.
action_dim (int): Dimensionality of the action space.
hidden_sizes (list or tuple): Hidden layer sizes.
activation (Function): Activation function for the network.
action_limit (float or int): Limits of the action space.

`MLPQActor.forward`[source]

MLPQActor.forward(x:Tensor)

Return output from the policy network scaled to the limits of the env action space. Args:

x (torch.Tensor): States from environment.

Returns:

scaled_action (torch.Tensor): Action scaled to action space limits.

`class` `MLPQFunction`[source]

MLPQFunction(state_features:int, action_dim:int, hidden_sizes:Union[tuple, list], activation:Callable) :: Module

A Q function network for Q policy gradient methods.

The Q function is an MLP. It always takes in a (state, action) pair and returns a Q-value estimate for that pair.

Args:

state_features (int): Dimensionality of the state space.
action_dim (int): Dimensionality of the action space.
hidden_sizes (list or tuple): Hidden layer sizes.
activation (Function): Activation function for the network.

`MLPQFunction.forward`[source]

MLPQFunction.forward(x:Tensor, a:Tensor)

Return Q-value estimate for state, action pair (x, a).

Args:

x (torch.Tensor): Environment state.
a (torch.Tensor): Action taken by the policy.

Returns:

q (torch.Tensor): Q-value estimate for state action pair.

class MLP[source]

class CNN[source]

class Actor[source]

Actor.forward[source]

class CategoricalPolicy[source]

CategoricalPolicy.logprob_from_distribution[source]

CategoricalPolicy.action_distribution[source]

class GaussianPolicy[source]

GaussianPolicy.action_distribution[source]

GaussianPolicy.logprob_from_distribution[source]

class ActorCritic[source]

ActorCritic.step[source]

ActorCritic.act[source]

class MLPQActor[source]

MLPQActor.forward[source]

class MLPQFunction[source]

MLPQFunction.forward[source]

`class` `MLP`[source]

`class` `CNN`[source]

`class` `Actor`[source]

`Actor.forward`[source]

`class` `CategoricalPolicy`[source]

`CategoricalPolicy.logprob_from_distribution`[source]

`CategoricalPolicy.action_distribution`[source]

`class` `GaussianPolicy`[source]

`GaussianPolicy.action_distribution`[source]

`GaussianPolicy.logprob_from_distribution`[source]

`class` `ActorCritic`[source]

`ActorCritic.step`[source]

`ActorCritic.act`[source]

`class` `MLPQActor`[source]

`MLPQActor.forward`[source]

`class` `MLPQFunction`[source]

`MLPQFunction.forward`[source]