alf.networks#

alf.networks.action_encoder#

A simple parameterless action encoder.

class SimpleActionEncoder(action_spec)[source]#

Bases: alf.networks.network.Network

A simple encoder for action.

It encodes discrete action to one hot representation and use the original continous actions. The output is the concat of all of them after flattening.

Parameters: action_spec (nested BoundedTensorSpec) – spec for actions

forward(inputs, state=())[source]#

Generate encoded actions.

Parameters: inputs (nested Tensor) – action tensors.
Returns: nested Tensor with the same structure as inputs.

training: bool#

alf.networks.actor_distribution_networks#

ActorDistributionNetwork and ActorRNNDistributionNetwork.

class ActorDistributionNetwork(input_tensor_spec, action_spec, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_fc_bn=False, discrete_projection_net_ctor=<class 'alf.networks.projection_networks.CategoricalProjectionNetwork'>, continuous_projection_net_ctor=<class 'alf.networks.projection_networks.NormalProjectionNetwork'>, name='ActorDistributionNetwork')[source]#

Bases: alf.networks.actor_distribution_networks.ActorDistributionNetworkBase

Network which outputs temporally uncorrelated action distributions.

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input
action_spec (TensorSpec) – the action spec
input_preprocessors (nested InputPreprocessor) – a nest of InputPreprocessor, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes.
activation (nn.functional) – activation used for hidden layers.
kernel_initializer (Callable) – initializer for all the layers excluding the projection net. If none is provided a default xavier_uniform will be used.
use_fc_bn (bool) – whether use Batch Normalization for the internal FC layers (i.e. FC layers except the last one).
discrete_projection_net_ctor (ProjectionNetwork) – constructor that generates a discrete projection network that outputs discrete actions.
continuous_projection_net_ctor (ProjectionNetwork) – constructor that generates a continuous projection network that outputs continuous actions.
name (str) –

training: bool#

class ActorDistributionNetworkBase(input_tensor_spec, action_spec, encoding_network_ctor, discrete_projection_net_ctor, continuous_projection_net_ctor, name='ActorDistributionNetworkBase', **encoder_kwargs)[source]#

Bases: alf.networks.network.Network

A base class for ActorDistributionNetwork and ActorDistributionRNNNetwork.

Can also be used to create customized actor networks by providing different encoding network creators.

Parameters

input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the input.
action_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the action.
encoding_network_ctor (Callable) – the creator of the encoding network that does the heavy lifting of the actor.
discrete_projection_net_ctor (ProjectionNetwork) – constructor that generates a discrete projection network that outputs discrete actions.
continuous_projection_net_ctor (ProjectionNetwork) – constructor that generates a continuous projection network that outputs continuous actions.
name (str) – name of the network
encoder_kwargs – the extra keyword arguments to the encoding network

forward(observation, state=())[source]#

Computes an action distribution given an observation.

Parameters

observation (torch.Tensor) – consistent with input_tensor_spec
state – empty for API consistent with ActorRNNDistributionNetwork

Returns

action distribution state: empty

Return type

act_dist (torch.distributions)

make_parallel(n)[source]#: Create a ParallelActorDistributionNetwork using n replicas of self. The initialized network parameters will be different.

property state_spec#: Return the state spec of the actor network. It is simply the state spec of the encoding network.

training: bool#

class ActorDistributionRNNNetwork(input_tensor_spec, action_spec, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, lstm_hidden_size=100, actor_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, discrete_projection_net_ctor=<class 'alf.networks.projection_networks.CategoricalProjectionNetwork'>, continuous_projection_net_ctor=<class 'alf.networks.projection_networks.NormalProjectionNetwork'>, name='ActorRNNDistributionNetwork')[source]#

Bases: alf.networks.actor_distribution_networks.ActorDistributionNetworkBase

Network which outputs temporally correlated action distributions.

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input
action_spec (TensorSpec) – the action spec
input_preprocessors (nested InputPreprocessor) – a nest of InputPreprocessor, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers for encoding the observation.
lstm_hidden_size (int or tuple[int]) – the hidden size(s) of the LSTM cell(s). Each size corresponds to a cell. If there are multiple sizes, then lstm cells are stacked.
actor_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers that are applied after the lstm cell’s output.
activation (nn.functional) – activation used for hidden layers.
kernel_initializer (Callable) – initializer for all the layers excluding the projection net. If none is provided a default xavier_uniform will be used.
discrete_projection_net_ctor (ProjectionNetwork) – constructor that generates a discrete projection network that outputs discrete actions.
continuous_projection_net_ctor (ProjectionNetwork) – constructor that generates a continuous projection network that outputs continuous actions.
name (str) –

training: bool#

class LatentActorDistributionNetwork(input_tensor_spec, action_spec, prior_actor_distribution_network_ctor=<class 'alf.networks.actor_distribution_networks.UnitNormalActorDistributionNetwork'>, normalizing_flow_network_ctor=<class 'alf.networks.normalizing_flow_networks.RealNVPNetwork'>, conditional_flow=True, scale_distribution=False, dist_squashing_transform=StableTanh(), name='LatentActorDistributionNetwork')[source]#

Bases: alf.networks.network.Network

Generating an actor distribution by transforming a prior action distribution (e.g., standard Normal noise \(\mathcal{N}(0,1)\)) with a normalizing flow network. The resulting distribution might have an arbitrary shape.

Warning

Like some invertible transform such as StableTanh, the inverse computation of a normalizing flow transform might cause numerical issues. For policy gradient methods like AC and PPO, transform caches are usually invalidated because of detaching actions for PG loss. So LatentActorDistributionNetwork is best suitable for non PG algorithms like DDPG and SAC. See alf/docs/notes/compute_probs_of_transformed_dist.rst for details.

Parameters

input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the input
action_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the action spec
prior_actor_distribution_network_ctor (Callable) – a constructor that creates any actor distribution network. The only requirement is that this class returns an action distribution (could be transformed) for forward().
normalizing_flow_network_ctor (Callable) – a constructor that creates a normalizing flow network which is used to transform the prior action distribution.
conditional_flow (bool) – whether to make the normalizing flow network use inputs to condition its transformations. Only valid for normalizing flow nets that support this option.
scale_distribution (bool) – Whether or not to scale the output distribution to ensure that the output aciton fits within the action_spec.
dist_squashing_transform (Transform) – A distribution Transform which transforms values into \((-1, 1)\). Default to dist_utils.StableTanh()
name (str) – name of the network

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#

class ParallelActorDistributionNetwork(actor_network, n, name='ParallelActorDistributionNetwork')[source]#

Bases: alf.networks.network.Network

Perform n actor distribution computations in parallel.

It creates a parallelized version of actor_network. :type actor_network: ActorDistributionNetwork :param actor_network: non-parallelized actor network :type actor_network: ActorDistributionNetwork :type n: int :param n: make n replicas from actor_network with different

initialization.

Parameters: name (str) –

forward(observation, state=())[source]#: Computes action distribution given a batch of observations. :param inputs: A tuple of Tensors consistent with input_tensor_spec`. :type inputs: tuple :param state: Empty for API consistent with ActorDistributionRNNNetwork. :type state: tuple

property state_spec#: Return the state spec of the actor network. It is simply the state spec of the encoding network.

training: bool#

class UnitNormalActorDistributionNetwork(input_tensor_spec, action_spec, name='UnitNormalActorDistributionNetwork')[source]#

Bases: alf.networks.network.Network

Outputs a constant unit normal regardless of the inputs.

Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of

the input.

state_spec (nested TensorSpec): the (nested) tensor spec of the state: of the network.

name (str):

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

alf.networks.actor_networks#

ActorNetworks

class ActorNetwork(input_tensor_spec, action_spec, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, squashing_func=<built-in method tanh of type object>, kernel_initializer=None, name='ActorNetwork')[source]#

Bases: alf.networks.actor_networks.ActorNetworkBase

Creates an instance of ActorNetwork, which maps the inputs to actions (single or nested) through a sequence of deterministic layers.

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input.
action_spec (BoundedTensorSpec) – the tensor spec of the action.
input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
squashing_func (Callable) – the activation function used to squashing the output to the range \((-1, 1)\). Default to tanh.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a variance_scaling_initializer with uniform distribution will be used.
name (str) – name of the network

training: bool#

class ActorNetworkBase(input_tensor_spec, action_spec, encoding_network_ctor=<class 'alf.networks.encoding_networks.EncodingNetwork'>, squashing_func=<built-in method tanh of type object>, name='ActorNetworkBase', **encoder_kwargs)[source]#

Bases: alf.networks.network.Network

A base class for ActorNetwork and ActorRNNNetwork.

Can also be used to create customized actor networks by providing different encoding network creators.

Parameters

input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the input.
action_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the action.
encoding_network_ctor (Callable) – the creator of the encoding network that does the heavy lifting of the actor.
squashing_func – the activation function used to squashing the output to the range \((-1, 1)\). Default to tanh.
name – name of the network
encoder_kwargs – the extra keyword arguments to the encoding network

forward(observation, state=())[source]#

Computes action given an observation.

Parameters

inputs – A tensor consistent with input_tensor_spec
state – empty for API consistent with ActorRNNNetwork

Returns

action (torch.Tensor): a tensor consistent with action_spec
state: empty

Return type

tuple

property state_spec#: Return the state spec of the actor network. It is simply the state spec of the encoding network.

training: bool#

class ActorRNNNetwork(input_tensor_spec, action_spec, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, lstm_hidden_size=100, actor_fc_layer_params=None, activation=<built-in method relu_ of type object>, squashing_func=<built-in method tanh of type object>, kernel_initializer=None, name='ActorRNNNetwork')[source]#

Bases: alf.networks.actor_networks.ActorNetworkBase

Creates an instance of ActorRNNNetwork, which maps the inputs (observation and states) to actions (single or nested) through a sequence of deterministic layers.

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input.
action_spec (BoundedTensorSpec) – the tensor spec of the action.
input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes.
lstm_hidden_size (int or tuple[int]) – the hidden size(s) of the LSTM cell(s). Each size corresponds to a cell. If there are multiple sizes, then lstm cells are stacked.
actor_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers that are applied after the lstm cell’s output.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
squashing_func (Callable) – the activation function used to squashing the output to the range \((-1, 1)\). Default to tanh.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a variance_scaling_initializer with uniform distribution will be used.
name (str) – name of the network

training: bool#

alf.networks.containers#

Various Network containers.

Branch(*modules, input_tensor_spec=None, name='Branch', **named_modules)[source]#

Apply multiple networks on the same input.

Example:

net = Branch((module1, module2))
y, new_state = net(x, state)

is equivalent to the following:

y0, new_state0 = module1(x, state[0])
y1, new_state1 = module2(x, state[1])
y = (y0, y1)
new_state = (new_state0, new_state1)

Parameters

modules (nested nn.Module | Callable) – a nest of torch.nn.Module alf.nn.Network or Callable. Note that Branch(module_a, module_b) is equivalent to Branch((module_a, module_b))
named_modules (nn.Module | Callable) – a simpler way of specifying a dict of modules. Branch(a=model_a, b=module_b) is equivalent to Branch(dict(a=module_a, b=module_b))
input_tensor_spec (nested TensorSpec) – must be provided if it cannot be inferred from any one of modules
name (str) –

class Echo(block, input_tensor_spec=None)[source]#

Bases: alf.networks.network.Network

Echo network.

Echo network uses part of the output of block of current step as part of the input of block for the next step. In particular, if the input of block is a dictionary, it should contains two keys ‘input’ and ‘echo’, and ‘echo’ will be taken from the output of the previous step. If the input of block is a tuple, the second input will be taken from the output of the previous step. If the output is a dictionary, it should contains two keys ‘output’ and ‘echo’, and ‘echo’ will be used as the input for the next step. If the output is a tuple, the second output will be used as the input for the next step.

Note that block itself can be a recurrent network with state.

Examples:

echo = Echo(block)
output, state = echo(real_input, state)

is equivalent to the following if the input and output of block are dicts:

block_state, echo_input = state
block_output, block_state = block(dict(input=real_input, echo=echo_input), block_state)
output = block_output['output']
echo_output = block_output['echo']
state = (block_state, echo_output)

and is equivalent to the following if the input and output of block are tuples:

block_state, echo_input = state
block_output, block_state = block((real_input, echo_input), block_state)
output, echo_output = block_output
state = (block_state, echo_output)

Parameters

block (Network) – the module for performing the actual computation
input_tensor_spec (nested TensorSpec) – If provided, it must match the block.input_tensor_spec[0] or block.input_tensor_spec['input']

forward(input, state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

make_parallel(n)[source]#

Create a parallelized version of this network.

Parameters: n (int) – the number of copies
Returns: the parallelized version of this network

training: bool#

class Parallel(modules, input_tensor_spec=None, name='Parallel')[source]#

Bases: alf.networks.network.Network

Apply each Network in the nest of Network to the corresponding input.

Example:

net = Parallel((module1, module2))
y, new_state = net(x, state)

is equivalent to the following:

y0, new_state0 = module1(x[0], state[0])
y1, new_state1 = module2(x[1], state[1])
y = (y0, y1)
new_state = (new_state0, new_state1)

Parameters

modules (nested nn.Module) – a nest of torch.nn.Module or alf.nn.Network.
input_tensor_spec (nested TensorSpec) – must be provided if it cannot be inferred from modules.
name (str) –

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

make_parallel(n)[source]#

Create a parallelized version of this network.

Parameters: n (int) – the number of copies
Returns: the parallelized version of this network

property networks#

training: bool#

Sequential(*modules, output='', input_tensor_spec=None, name='Sequential', **named_modules)[source]#

Network composed of a sequence of torch.nn.Module or alf.nn.Network.

All the modules provided through modules and named_modules are calculated sequentially in the same order as they appear in the call to Sequential. Typically, each module takes the result of the previous module as its input (or the input to the Sequential if it is the first module), and the result of the last module is the output of the Sequential. But we also allow more flexibilities as shown in example 2.

Example 1:

net = Sequential(module1, module2)
y, new_state = net(x, state)

is equivalent to the following:

z, new_state1 = module1(x, state[0])
y, new_state2 = module2(z, state[1])
new_state = (new_state1, new_state2)

Example 2:

net = Sequential(
    module1, a=module2, b=(('input', 'a'), module3), output=('a', 'b'))
output, new_state = net(input, state)

is equivalent to the following:

_, new_state1 = module1(input, state[0])
a, new_state2 = module2(_, state[1])
b, new_state3 = module3((input, a), state[2])
new_state = (new_state1, new_state2, new_state3)
output = (a, b)

Parameters

modules (Callable | (nested str, Callable)) – The Callable can be a torch.nn.Module, alf.nn.Network or plain Callable. Optionally, their inputs can be specified by the first element of the tuple. If input is not provided, it is assumed to be the result of the previous module (or input to this Sequential for the first module). If input is provided, it should be a nested str. It will be used to retrieve results from the dictionary of the current named_results. For modules specified by modules, because no named_modules has been invoked, named_results is {'input': input}.
named_modules (Callable | (nested str, Callable)) – The Callable can be a torch.nn.Module, alf.nn.Network or plain Callable. Optionally, their inputs can be specified by the first element of the tuple. If input is not provided, it is assumed to be the result of the previous module (or input to this Sequential for the first module). If input is provided, it should be a nested str. It will be used to retrieve results from the dictionary of the current named_results. named_results is updated once the result of a named module is calculated.
output (nested str) – if not provided, the result from the last module will be used as output. Otherwise, it will be used to retrieve results from named_results after the results of all modules have been calculated.
input_tensor_spec (TensorSpec) – the tensor spec of the input. It must be specified if it cannot be inferred from modules[0].
name (str) –

alf.networks.critic_networks#

CriticNetworks

class CriticNetwork(input_tensor_spec, output_tensor_spec=TensorSpec(shape=(), dtype=torch.float32), observation_input_processors=None, observation_preprocessing_combiner=None, observation_conv_layer_params=None, observation_fc_layer_params=None, action_input_processors=None, action_preprocessing_combiner=None, action_fc_layer_params=None, observation_action_combiner=None, joint_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_fc_bn=False, use_naive_parallel_network=False, name='CriticNetwork')[source]#

Bases: alf.networks.encoding_networks.EncodingNetwork

Creates an instance of CriticNetwork for estimating action-value of continuous or discrete actions. The action-value is defined as the expected return starting from the given input observation and taking the given action. This module takes observation as input and action as input and outputs an action-value tensor with the shape of [batch_size].

The network take a tuple of (observation, action) as input to computes the action-value given an observation.

Parameters

input_tensor_spec – A tuple of TensorSpec``s ``(observation_spec, action_spec) representing the inputs.
output_tensor_spec (TensorSpec) – spec for the output
observation_input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding observation input.
observation_preprocessing_combiner (NestCombiner) – preprocessing called on complex observation inputs.
observation_conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
observation_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes for observations.
action_input_processors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding action input.
action_preprocessing_combiner (NestCombiner) – preprocessing called to combine complex action inputs.
action_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes for actions.
observation_action_combiner (NestCombiner) – combiner class for fusing the observation and action. If None, NestConcat will be used.
joint_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes FC layers after merging observations and actions.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a variance_scaling_initializer with uniform distribution will be used.
use_fc_bn (bool) – whether use Batch Normalization for the internal FC layers (i.e. FC layers beside the last one).
use_naive_parallel_network (bool) – if True, will use NaiveParallelNetwork when make_parallel is called. This might be useful in cases when the NaiveParallelNetwork has an advantange in terms of speed over ParallelNetwork. You have to test to see which way is faster for your particular situation.
name (str) –

make_parallel(n)[source]#: Create a parallel critic network using n replicas of self. The initialized network parameters will be different. If use_naive_parallel_network is True, use NaiveParallelNetwork to create the parallel network.

training: bool#

class CriticRNNNetwork(input_tensor_spec, output_tensor_spec=TensorSpec(shape=(), dtype=torch.float32), observation_input_processors=None, observation_preprocessing_combiner=None, observation_conv_layer_params=None, observation_fc_layer_params=None, action_input_processors=None, action_preprocessing_combiner=None, action_fc_layer_params=None, joint_fc_layer_params=None, lstm_hidden_size=100, critic_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, name='CriticRNNNetwork')[source]#

Bases: alf.networks.encoding_networks.LSTMEncodingNetwork

Creates an instance of CriticRNNNetwork for estimating action-value of continuous or discrete actions. The action-value is defined as the expected return starting from the given inputs (observation and state) and taking the given action. It takes observation and state as input and outputs an action-value tensor with the shape of [batch_size].

Parameters

input_tensor_spec – A tuple of TensorSpec``s ``(observation_spec, action_spec) representing the inputs.
ourput_tensor_spec (TensorSpec) – spec for the output
observation_input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding observation input.
observation_preprocessing_combiner (NestCombiner) – preprocessing called on complex observation inputs.
observation_conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
observation_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes for observations.
action_input_processors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding action input.a
action_preprocessing_combiner (NestCombiner) – preprocessing called to combine complex action inputs.
action_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes for actions.
joint_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes FC layers after merging observations and actions.
lstm_hidden_size (int or tuple[int]) – the hidden size(s) of the LSTM cell(s). Each size corresponds to a cell. If there are multiple sizes, then lstm cells are stacked.
critic_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers that are applied after the lstm cell’s output.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a variance_scaling_initializer with uniform distribution will be used.
name (str) –

make_parallel(n)[source]#: Create a parallel critic RNN network using n replicas of self. The initialized network parameters will be different. If use_naive_parallel_network is True, use NaiveParallelNetwork to create the parallel network.

training: bool#

alf.networks.dynamics_networks#

DynamicsNetwork

class DynamicsNetwork(input_tensor_spec, output_tensor_spec, joint_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, prob=False, continuous_projection_net_ctor=<class 'alf.networks.projection_networks.NormalProjectionNetwork'>, name='DynamicsNetwork')[source]#

Bases: alf.networks.network.Network

Create an instance of DynamicsNetwork.

Creates an instance of DynamicsNetwork for predicting the next observation given current observation and action.

Parameters

input_tensor_spec – A tuple of TensorSpecs (observation_spec, action_spec) representing the inputs.
joint_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes FC layers after merging observations and actions.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a variance_scaling_initializer with uniform distribution will be used.
prob (bool) – If True, use the probabistic mode of network; otherwise, use the determinstic mode of network.
continuous_projection_net_ctor (ProjectionNetwork) – constructor that generates a continuous projection network that outputs a distribution.
name (str) –

forward(inputs, state=())[source]#

Computes prediction given inputs.

Parameters

inputs – A tuple of Tensors consistent with input_tensor_spec
state – empty for API consistency

Returns

a tensor of the size [B, n, d] if self._prob is False: and a distribution if self._prob is True.

state: empty

Return type

out

make_parallel(n)[source]#: Create a ParallelCriticNetwork using n replicas of self. The initialized network parameters will be different.

training: bool#

class ParallelDynamicsNetwork(dynamics_network, n, name='ParallelDynamicsNetwork')[source]#

Bases: alf.networks.network.Network

Create n DynamicsNetwork in parallel.

It create a parallelized version of DynamicsNetwork.

Parameters

dynamics_network (DynamicsNetwork) – non-parallelized dynamics network
n (int) – make n replicas from dynamics_network with different initializations.
name (str) –

forward(inputs, state=())[source]#

Computes prediction given inputs.

Parameters

inputs – A tuple of Tensors consistent with input_tensor_spec
state – empty for API consistency

Returns

a tensor of the size [B, n, d] if self._prob is False: and a distribution if self._prob is True.

state: empty

Return type

out

training: bool#

alf.networks.encoding_networks#

class AutoShapeImageDeconvNetwork(input_size, transconv_layer_params, output_shape, start_decoding_channels, preprocess_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, output_activation=<built-in method tanh of type object>, name='AutoShapeImageDeconvNetwork')[source]#

Bases: alf.networks.containers._Sequential

A general template class for creating image deconv (transposed convolutional): networks with auto-shape inference (thus named as AutoShapeImageDeconvNetwork).

Auto-shape inference: instead of specifying an initial start shape for image deconv, this class only needs to specify the desired output shape for the image and will automatically calculate the desired shape to start decoding based on the specified transconv_layer_params and uses a FC layer to map the to the desired start shape.

Parameters

input_size (int) – the size of the input latent vector
transconv_layer_params (tuple[tuple]) – a non-empty tuple of tuple (num_filters, kernel_size, strides, padding), where padding is optional.
output_shape (tuple) – the complete output size would be output_shape = (c, h, w).
start_decoding_channels (int) – the initial number of channels we’d like to have for the feature map. Note that we always first project an input latent vector into a vector of an appropriate length so that it can be reshaped into (start_decoding_channels, start_decoding_height, start_decoding_width), where start_decoding_height and start_decoding_width are automatically inferred based on the specified output_shape and transconv_layer_params.
preprocess_fc_layer_params (tuple[int]) – a tuple of fc layer units. These fc layers are used for preprocessing the latent vector before transposed convolutions.
activation (nn.functional) – activation for hidden layers
kernel_initializer (Callable) – initializer for all the layers.
output_activation (nn.functional) – activation for the output layer. Usually our image inputs are normalized to [0, 1] or [-1, 1], so this function should be torch.sigmoid or torch.tanh.
name (str) –

training: bool#

class EncodingNetwork(input_tensor_spec, output_tensor_spec=None, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_fc_bn=False, last_layer_size=None, last_activation=None, last_kernel_initializer=None, last_use_fc_bn=False, name='EncodingNetwork')[source]#

Bases: alf.networks.containers._Sequential

Feed Forward network with CNN and FC layers which allows the last layer to have different settings from the other layers.

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input. If nested, then preprocessing_combiner must not be None.
output_tensor_spec (None|TensorSpec) – spec for the output. If None, the output tensor spec will be assumed as TensorSpec((output_size, )), where output_size is inferred from network output. Otherwise, the output tensor spec will be output_tensor_spec and the network output will be reshaped according to output_tensor_spec. Note that output_tensor_spec is only used for reshaping the network outputs for interpretation purpose and is not used for specifying any network layers.
input_preprocessors (nested Network|nn.Module|None) – a nest of preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing FC layer sizes.
activation (nn.functional) – activation used for all the layers but the last layer.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If None, a variance_scaling_initializer will be used.
use_fc_bn (bool) – whether use Batch Normalization for fc layers.
last_layer_size (int) – an optional size of an additional layer appended at the very end. Note that if last_activation is specified, last_layer_size has to be specified explicitly.
last_activation (nn.functional) – activation function of the additional layer specified by last_layer_size. Note that if last_layer_size is not None, last_activation has to be specified explicitly.
last_use_fc_bn (bool) – whether use Batch Normalization for the last fc layer.
last_kernel_initializer (Callable) – initializer for the the additional layer specified by last_layer_size. If None, it will be the same with kernel_initializer. If last_layer_size is None, last_kernel_initializer will not be used.
name (str) –

make_parallel(n, allow_non_parallel_input=False)[source]#

Make a parallelized version of module.

A parallel network has n copies of network with the same structure but different independently initialized parameters. The parallel network can process a batch of the data with shape [batch_size, n, …] using n networks with same structure.

TODO: remove allow_non_parallel_input. This means to make parallel network not to accept non-parallel input. It will make the logic more transparent.

Parameters

n (int) – the number of copies
allow_non_parallel_input (bool) – if True, the returned network will also accept non-parallel input with shape [batch_size, …]. In this case, the network will check whether the input is parallel input. If not, the input will be automatically replicated n times at the beginning.

Returns

the parallelized network.

training: bool#

class ImageDecodingNetwork(input_size, transconv_layer_params, start_decoding_size, start_decoding_channels, same_padding=False, preprocess_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, output_activation=<built-in method tanh of type object>, name='ImageDecodingNetwork')[source]#

Bases: alf.networks.containers._Sequential

A general template class for creating transposed convolutional decoding networks.

Initialize the layers for decoding a latent vector into an image. Currently there seems no need for this class to handle nested inputs; If necessary, extend the argument list to support it in the future.

How to calculate the output size: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html:

H = (H1-1) * strides + HF - 2P + OP

where H = output size, H1 = input size, HF = size of kernel, P = padding, OP = output_padding (currently hardcoded to be 0 for this class).

Regarding padding: in the previous TF version, we have two padding modes: valid and same. For the former, we always have no padding (P=0); for the latter, it’s also called half padding (P=(HF-1)//2 when strides=1 and HF is an odd number the output has the same size with the input. Currently, PyTorch doesn’t support different left and right paddings and P is always (HF-1)//2. So if HF is an even number, the output size will increaseby 1 when strides=1).

Parameters

input_size (int) – the size of the input latent vector
transconv_layer_params (tuple[tuple]) – a non-empty tuple of tuple (num_filters, kernel_size, strides, padding), where padding is optional.
start_decoding_size (int or tuple) – the initial height and width we’d like to have for the feature map
start_decoding_channels (int) – the initial number of channels we’d like to have for the feature map. Note that we always first project an input latent vector into a vector of an appropriate length so that it can be reshaped into (start_decoding_channels, start_decoding_height, start_decoding_width).
same_padding (bool) – similar to TF’s conv2d same padding mode. If True, the user provided paddings in transconv_layer_params will be replaced by automatically calculated ones; if False, it corresponds to TF’s valid padding mode (the user can still provide custom paddings though).
preprocess_fc_layer_params (tuple[int]) – a tuple of fc layer units. These fc layers are used for preprocessing the latent vector before transposed convolutions.
activation (nn.functional) – activation for hidden layers
kernel_initializer (Callable) – initializer for all the layers.
output_activation (nn.functional) – activation for the output layer. Usually our image inputs are normalized to [0, 1] or [-1, 1], so this function should be torch.sigmoid or torch.tanh.
name (str) –

training: bool#

class ImageDecodingNetworkV2(input_size, upsample_conv_layer_params, start_decoding_size, start_decoding_channels, preprocess_fc_layer_params=None, upsampling_mode='nearest', same_padding=False, activation=<built-in method relu_ of type object>, kernel_initializer=None, output_activation=<built-in method tanh of type object>, name='ImageDecodingNetworkV2')[source]#

Bases: alf.networks.containers._Sequential

Image decoding using upsampling+convolution.

Different with ImageDecodingNetwork which uses transposed convolution to transform a smaller input to a larger image output, this class uses upsampling followed by convolution layers. The idea is to let conv layer refine the upsampling (e.g., nearest neighbor, bilinear, etc) results.

The difference between transposed conv and upsampling+conv can be found in this article: https://distill.pub/2016/deconv-checkerboard/. In short, upsampling+conv might help reduce checkerboard artifacts that are common in the outputs by transposed convolutions.

An example network of upsampling+conv for decoding images.

net = ImageDecodingNetworkV2(input_size=100,
                             start_decoding_size=10,
                             start_decoding_channels=8,
                             same_padding=True,
                             upsample_conv_layer_params=(
                                2,
                                (16, 3, 1),
                                (32, 3, 1),
                                2,
                                (64, 3, 1),
                                (3, 3, 1)))
# The image shape: (8,10,10) -> (8,20,20) -> (16,20,20) -> (32,20,20)
#                  -> (32,40,40) -> (64,40,40) -> (3,40,40)

Parameters

input_size (int) – the size of the input latent vector
upsample_conv_layer_params (Tuple[Union[int, Tuple[int]]]) – a tuple of ints or tuples. If the element is an int, it represents the scaling factor for a torch.nn.Upsample layer; otherwise it should a tuple of ints representing conv params (num_filters, kernel_size, strides, padding), where padding is optional.
start_decoding_size (Union[int, Tuple[int]]) – the initial height and width we’d like to have for the feature map.
start_decoding_channels (int) – the initial number of channels we’d like to have for the feature map. Note that we always first project an input latent vector into a vector of an appropriate length so that it can be reshaped into (start_decoding_channels, start_decoding_height, start_decoding_width).
preprocess_fc_layer_params (Optional[Tuple[int]]) – if not None, then the input will be fed to a list of fc layers specified by this argument, before doing deconvolution.
upsampling_mode (str) – the argument for choosing an upsampling algorithm for torch.nn.Upsample.
same_padding (bool) – similar to TF’s conv2d same padding mode. If True, the user provided paddings in transconv_layer_params will be replaced by automatically calculated ones; if False, it corresponds to TF’s valid padding mode (the user can still provide custom paddings though). Please refer to the docstring of ImageEncodingNetwork for definitions of the two padding modes.
activation (Callable) – activation for hidden layers
kernel_initializer (Optional[Callable]) – initializer for all the layers.
output_activation (Callable) – activation for the output layer. Usually our image inputs are normalized to [0, 1] or [-1, 1], so this function should be torch.sigmoid or torch.tanh.
name (str) –

training: bool#

class ImageEncodingNetwork(input_channels, input_size, conv_layer_params, same_padding=False, activation=<built-in method relu_ of type object>, kernel_initializer=None, flatten_output=False, name='ImageEncodingNetwork')[source]#

Bases: alf.networks.containers._Sequential

A general template class for creating convolutional encoding networks.

Initialize the layers for encoding an image into a latent vector. Currently there seems no need for this class to handle nested inputs; If necessary, extend the argument list to support it in the future.

How to calculate the output size: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html:

H = (H1 - HF + 2P) // strides + 1

where H = output size, H1 = input size, HF = size of kernel, P = padding.

Regarding padding: in the previous TF version, we have two padding modes: valid and same. For the former, we always have no padding (P=0); for the latter, it’s also called “half padding” (P=(HF-1)//2 when strides=1 and HF is an odd number the output has the same size with the input. Currently, PyTorch don’t support different left and right paddings and P is always (HF-1)//2. So if HF is an even number, the output size will decrease by 1 when strides=1).

Parameters

input_channels (int) – number of channels in the input image
input_size (int or tuple) – the input image size (height, width)
conv_layer_params (tuppe[tuple]) – a non-empty tuple of tuple (num_filters, kernel_size, strides, padding), where padding is optional
same_padding (bool) – similar to TF’s conv2d same padding mode. If True, the user provided paddings in conv_layer_params will be replaced by automatically calculated ones; if False, it corresponds to TF’s valid padding mode (the user can still provide custom paddings though)
activation (torch.nn.functional) – activation for all the layers
kernel_initializer (Callable) – initializer for all the layers.
flatten_output (bool) – If False, the output will be an image structure of shape BxCxHxW; otherwise the output will be flattened into a feature of shape BxN.

training: bool#

class LSTMEncodingNetwork(input_tensor_spec, output_tensor_spec=None, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, pre_fc_layer_params=None, hidden_size=(100, ), lstm_output_layers=-1, post_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, last_layer_size=None, last_activation=None, last_kernel_initializer=None, name='LSTMEncodingNetwork')[source]#

Bases: alf.networks.containers._Sequential

LSTM cells followed by an encoding network.

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input. If nested, then preprocessing_combiner must not be None.
output_tensor_spec (None|TensorSpec) – spec for the output. If None, the output tensor spec will be assumed as TensorSpec((output_size, )), where output_size is inferred from network output. Otherwise, the output tensor spec will be output_tensor_spec and the network output will be reshaped according to output_tensor_spec. Note that output_tensor_spec is only used for reshaping the network outputs for interpretation purpose and is not used for specifying any network layers.
input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
pre_fc_layer_params (tuple[int]) – a tuple of integers representing FC layers that are applied before the LSTM cells.
hidden_size (int or tuple[int]) – the hidden size(s) of the lstm cell(s). Each size corresponds to a cell. If there are multiple sizes, then lstm cells are stacked.
lstm_output_layers (None|int|list[int]) – -1 means the output from the last lstm layer. None means all lstm layers.
post_fc_layer_params (tuple[int]) – an optional tuple of integers representing hidden FC layers that are applied after the LSTM cells.
activation (nn.functional) – activation for all the layers but the last layer.
kernel_initializer (Callable) – initializer for all the layers but the last layer.
last_layer_size (int) – an optional size of an additional layer appended at the very end. Note that if last_activation is specified, last_layer_size has to be specified explicitly.
last_activation (nn.functional) – activation function of the additional layer specified by last_layer_size. Note that if last_layer_size is not None, last_activation has to be specified explicitly.
last_kernel_initializer (Callable) – initializer for the the additional layer specified by last_layer_size. If None, it will be the same with kernel_initializer. If last_layer_size is None, last_kernel_initializer will not be used.

make_parallel(n, allow_non_parallel_input=False)[source]#

Make a parallelized version of module.

Parameters

n (int) – the number of copies
allow_non_parallel_input (bool) – if True, the returned network will also accept non-parallel input with shape [batch_size, …]. In this case, the network will check whether the input is parallel input. If not, the input will be automatically replicated n times at the beginning.

Returns

the parallelized network.

training: bool#

ParallelEncodingNetwork(input_tensor_spec, n, output_tensor_spec=None, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_fc_bn=False, last_layer_size=None, last_activation=None, last_kernel_initializer=None, last_use_fc_bn=False, name='ParallelEncodingNetwork')[source]#

Parallel encoding network which effectively runs n individual encoding network simultaneuosl.

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input. If nested, then preprocessing_combiner must not be None.
n (int) – number of parallel networks
output_tensor_spec (None|TensorSpec) – spec for the output, excluding the dimension of paralle networks n. If None, the output tensor spec will be assumed as TensorSpec((n, output_size, )), where output_size is inferred from network output. Otherwise, the output tensor spec will be TensorSpec((n, *output_tensor_spec.shape)) and the network output will be reshaped accordingly. Note that output_tensor_spec is only used for reshaping the network outputs for interpretation purpose and is not used for specifying any network layers.
input_preprocessors (None) – must be None.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing FC layer sizes.
activation (nn.functional) – activation used for all the layers but the last layer.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If None, a variance_scaling_initializer will be used.
use_fc_bn (bool) – whether use Batch Normalization for fc layers.
last_layer_size (int) – an optional size of an additional layer appended at the very end. Note that if last_activation is specified, last_layer_size has to be specified explicitly.
last_activation (nn.functional) – activation function of the additional layer specified by last_layer_size. Note that if last_layer_size is not None, last_activation has to be specified explicitly.
last_kernel_initializer (Callable) – initializer for the the additional layer specified by last_layer_size. If None, it will be the same with kernel_initializer. If last_layer_size is None, last_kernel_initializer will not be used.
last_use_fc_bn (bool) – whether use Batch Normalization for the last fc layer.
name (str) –

Returns

the parallelized network

SpatialBroadcastDecodingNetwork(input_size, output_height, conv_layer_params, output_width=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, output_activation=<function identity>, name='SpatialBroadcastDecodingNetwork')[source]#

Implements the spatial broadcast decoder in

Watters et al. 2019, Spatial Broadcast Decoder: A Simple Architecture for Learning Disentangled Representations in VAEs.

In short, given a latent embedding and target output height/width, this decoder first spatially broadcast the embedding over height*width, append a uniform xy meshgrid in [-1,1], and apply conv layers.

Parameters

input_size (int) – the latent embedding size
output_height (int) – the target output image height
conv_layer_params (Tuple[Tuple[int]]) – a tuple of conv layer params after broadcasting
output_width (Optional[int]) – if None, it’s equal to output_height
fc_layer_params (Optional[Tuple[int]]) – a tuple of fc layers applied to the input embedding before broadcasting
activation (Callable) – activation of the intermediate conv layers
output_activation (Callable) – the final activation

alf.networks.mdq_critic_networks#

MdqCriticNetworks

class MdqCriticNetwork(input_tensor_spec, action_qt=None, num_critic_replicas=2, obs_encoding_layer_params=None, pre_encoding_layer_params=None, mid_encoding_layer_params=None, post_encoding_layer_params=None, free_form_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, debug_summaries=False, name='MdqCriticNetwork')[source]#

Bases: alf.networks.network.Network

Create an instance of MdqCriticNetwork for estimating action-value of continuous actions and action sampling used in the MDQ algorithm.

Creates an instance of MdqCriticNetwork for estimating action-value of continuous actions and action sampling.

Currently there are two branches of networks:

free-form branch: a plain MLP for Q-learning
adv-form branch: an advantage form of network for action
generation. It is trained by a target from the free-form net.

The adv-form branch has the following structures for flexibility:

obs -> [obs_encoding_net] -> encoded_obs encoded_obs, action ->

[pre_encoding_nets] -> [mid_shared_encoding_nets] -> [post_encoding_nets] -> outputs

where the pre_encoding_nets and post_encoding_nets do not share parameters across action dimensions while mid_shared_encoding_nets shares parameters across action dimensions. If the encoding_layer_params for a sub-net is None, that sub-net is effectively neglected.

Furthermore, to enable parallel computation across action dimension in the case of value computation, we have both parallel and individual versions for the nets without parameter sharing. For exmaple, for post_encoding_nets, we also have post_encoding_parallel_net, which is essentially the equivalent form of post_encoding_nets but supports parallel forwarding. The parameters of the two versions are synced. The partial actions (a[0:i]) are zero-padded for both parallel and individual networks to enable parallel computation.

For conciseness purpose, the following notations will be used when convenient:

B: batch size

d: dimensionality of feature

n: number of network replica

action_dim: the dimensionality of actions

action_bin: number of discrete bins for each action dim

Parameters

input_tensor_spec – A tuple of TensorSpecs (observation_spec, action_spec) representing the inputs.
action_qt (ActionQuantizer) – action quantization module
num_critic_replicas (int) – number of critic networks
obs_encoding_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes for encoding observations.
pre_encoding_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes for encoding concatenated [encoded_observation, actions]. Parameters are not shared across action dimensions
mid_encoding_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer for further encoding the outputs from pre_encoding_net. The parameters are shared across action dimensions.
post_encoding_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer for further encoding the outputs from mid_encoding_net. The parameters are not shared across action dimensions.
free_form_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer for Q-learning. We refer it as the free form to differentiate it from the mdq-form of network which is structured.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a variance_scaling_initializer with uniform distribution will be used.
name (str) –

forward(inputs, alpha, state=(), free_form=False)[source]#

Computes action-value given an observation.

Parameters

inputs – A tuple of Tensors consistent with input_tensor_spec
alpha – the temperature used for the advantage computation
state – empty for API consistenty
free_form (bool) – use the free-form branch for computation if True; default value is False

Returns

if free_form is True, its shape is [B, n]
- if free_form is False, its shape is [B, n, action_dim]

state: empty

Return type

Q_values (torch.Tensor)

get_action(inputs, alpha, greedy)[source]#

Sample action from the distribution induced by the mdq-net.

Parameters

inputs – A tuple of Tensors consistent with input_tensor_spec
alpha – the temperature used for the advantage computation
greedy (bool) – If True, do greedy sampling by taking the mode of the distribution. If False, do direct sampling from the distribution.

Returns

a tensor of the shape [B, n, action_dim] log_pi_per_dim (torch.Tensor): a tensor of the shape

[B, n, action_dim] representing the log_pi for each dimension of the sampled multi-dimensional action

Return type

actions (torch.Tensor)

get_uniform_prior_logpi()[source]#

sync_net()[source]#

training: bool#

alf.networks.memory#

Various memory classes.

Currently, all the memory classes implemented here only supports memory in one episode, which means that the memory is reset at the beginning of an episode.

class FIFOMemory(dim, size, name='FIFOMemory')[source]#

Bases: alf.networks.memory.Memory

A Simple FIFO memory.

When new memory slots are written, the oldest memory slots are removed.

Parameters

dim (int) – dimension of memory content
size (int) – number of memory slots

build(batch_size)[source]#

Build the memory for batch_size.

User does not need to call this explictly. read and write will automatically call this if the memory has not been built yet.

Note: Subsequent write and read must match this batch_size :param batch_size: batch size of the model. :type batch_size: int

from_states(states)[source]#

Restore the memory from states.

Parameters: states (tuple of Tensor) – It is should be obtained from states().

mask()[source]#

Get the mask for the stored memory.

Returns: shape=(batch_size, size), dtype=torch.bool
Return type: Tensor

memory()[source]#

read(keys)[source]#

Read out memory vectors for the given keys.

Parameters

keys (Tensor) – shape is (b, dim) or (b, k, dim) where b is batch size, k is the number of read keys, and dim is memory content dimension

Returns

shape is same as keys. result[…, i] is the read: result for the corresponding key.

Return type

resutl (Tensor)

property states#

Get the states of the memory.

Returns: tuple of memory content and usage tensor.
Return type: memory states

write(content)[source]#

Write content to memory.

Append the content to memory. If the memory is full, the oldest slot will be removed.

Parameters: content (Tensor) – shape should be [b, dim] or [b, k, dim] where k means the number of memory slots to be written

class Memory(dim, size, state_spec, name='Memory')[source]#

Bases: object

Abstract base class for Memory.

Parameters

dim (int) – dimension of memory content
size (int) – number of memory slots
state_spec (nested TensorSpec) – the spec for the states
name (str) – name of this memory

property dim#: Get the dimension of each content vector.

abstract read(keys)[source]#

Read out memory vectors for the given keys.

Parameters

keys (Tensor) – shape is (b, dim) or (b, k, dim) where b is batch size, k is the number of read keys, and dim is memory content dimension

Returns

shape is same as keys. result[…, i] is the read: result for the corresponding key.

Return type

resutl (Tensor)

property size#: Get the size of the memory (i.e. the number of memory slots).

property state_spec#: Get the state tensor specs.

abstract write(content)[source]#

Write content to memory.

The way how it is written to the memory buffer is decided by the subclass.

Parameters: content (Tensor) – shape should be (b, dim)

class MemoryWithUsage(dim, size, snapshot_only=False, normalize=True, scale=None, usage_decay=None, name='MemoryWithUsage')[source]#

Bases: alf.networks.memory.Memory

Memory with usage indicator.

MemoryWithUsage stores memory in a matrix. During memory write, the memory slot with the smallest usage is replaced by the new memory content. The memory content can be retrived thrugh attention mechanism using read.

This implementation follows the one decribed in arXiv:1803.10760.

See Methods 2.3 of Unsupervised Predictive Memory in a Goal-Directed Agent

Parameters

dim (int) – dimension of memory content
size (int) – number of memory slots
snapshot_only (bool) – If True, only keeps the last snapshot of the memory instead of keeping all the memory snapshot at every steps. If True, gradient cannot be propagated to the writer.
normalize (bool) – If True, use cosine similarity, otherwise use dot product.
scale (None|float) – Scale the similarity by this. If scale is None, a default value is used based normalize. If normalize is True, scale is default to 5.0. If normalize is False, scale is default to 1/sqrt(dim).
usage_decay (None|float) – The usage will be scaled by this factor at every write call. If None, it is default to 1 - 1 / size

build(batch_size)[source]#

Build the memory for batch_size.

User does not need to call this explictly. read and write will automatically call this if the memory has not been built yet.

Note: Subsequent write and read must match this batch_size :param batch_size: batch size of the model. :type batch_size: int

create_keynet(query_spec, num_keys)[source]#

Create a net which can be used to generate keys.

The created keynet can be used with genkey_and_read.

Parameters

query_spec (alf.TensorSpec) – the spec for the query
num_keys (int) – the number of keys to be generated.

Returns

a function which calculates num_keys keys given query.

Return type

Callable

from_states(states)[source]#

Restore the memory from states.

Parameters: states (tuple of Tensor) – It is should be obtained from states().

genkey_and_read(keynet, query, flatten_result=True)[source]#

Generate key and read.

Parameters

keynet (Callable) – keynet(query) is a tensor of shape (batch_size, num_keys * (dim + 1)). keynet can be created using create_keynet.
query (Tensor) – the query from which the keys are generated
flatten_result (bool) – If True, the result shape will be (batch_size, num_keys * dim), otherwise it is (batch_size, num_keys, dim)

Returns

If flatten_result is True, its shape is (batch_size, num_keys * dim),: otherwise it is (batch_size, num_keys, dim)

Return type

Tensor

read(keys, scale=None)[source]#

Read from memory.

Read the memory for given the keys. For each key in keys we will get one result as \(r = \sum_i M_i a_i\) where \(M_i\) is the memory content at location i and \(a_i\) is the attention weight for key at location i. \(a\) is calculated as softmax of a scaled similarity between key and each memory content: \(a_i = \exp(\frac{scale*sim_i}{\sum_i scale*sim_i})\)

Parameters

keys (Tensor) – shape[-1] is dim. For single key read, the shape is (batch_size, dim). For multiple key read, the shape is (batch_szie, k, dim), where k is the number of keys.
scale (None|float|Tensor) – shape is () or keys.shape[:-1]. The cosine similarities are multiplied with scale before softmax is applied. If None, use the scale provided at constructor.

Returns

shape is same as keys. result[…, i] is the read: result for the corresponding key.

Return type

resutl Tensor

reset()[source]#

Reset the the memory to the initial state.

Both memory and uage are set to zeros.

property states#

Get the states of the memory.

Returns: tuple of memory content and usage tensor.
Return type: memory states

property usage#

Get the usage for each memory slots.

Returns: usage (Tensor) of shape (batch_size, size)

write(content)[source]#

Write content to memory.

Append the content to memory. If the memory is full, the slot with the smallest usage will be overriden. The usage is calculated during read as the sum of past attentions.

Parameters: content (Tensor) – shape should be (b, dim)

alf.networks.network#

Base extension to torch.nn.Module. Adapted from tf_agents/tf_agents/networks/network.py

class BatchSquashNetwork(network, batch_dims=2, name='BatchSquashNetwork')[source]#

Bases: alf.networks.network.Network

Wrap a network so that it works on multiple batch dims. Note that the output spec of this network is the same with that of the wrapped network ( it won’t include batch dims).

Parameters

network (Network) – the network to be wrapped
batch_dims (int) – how many batch dims to squash before forward

Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of

the input.

state_spec (nested TensorSpec): the (nested) tensor spec of the state: of the network.

name (str):

forward(x, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class NaiveParallelNetwork(network, n, name=None)[source]#

Bases: alf.networks.network.Network

Naive implementation of parallel network.

A parallel network has n copies of network with the same structure but different indepently initialized parameters.

NaiveParallelNetwork created n independent networks with the same structure as network and evaluate them separately in loop during forward().

Parameters

network (Network) – the parallel network will have n copies of network.
n (int) – n copies of network
name (str) – a string that will be used as the name of the created NaiveParallelNetwork instance. If None, naive_parallel_ followed by the network.name will be used by default.

forward(inputs, state=())[source]#

Compute the output and the next state.

Parameters

inputs (nested torch.Tensor) – its shape can be [B, n, ...], or [B, ...]
state (nested torch.Tensor) – its shape must be [B, n, ...]

Returns

its shape is [B, n, ...] next_state (nested torch.Tensor): its shape is [B, n, ...]

Return type

output (nested torch.Tensor)

training: bool#

class Network(input_tensor_spec, state_spec=(), name='Network')[source]#

Bases: torch.nn.modules.module.Module

A base class for various networks.

Base extension to nn.Module to simplify copy operations.

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input.
state_spec (nested TensorSpec) – the (nested) tensor spec of the state of the network.
name (str) –

copy(**kwargs)[source]#

Create a copy of this network or return the current instance.

If self._singleton_instance is True, calling copy() will return self; otherwise it will re-create and return a new Network instance using the original arguments used by the constructor.

NOTE When re-creating Network, Network layer weights are never copied. This method recreates the Network instance with the same arguments it was initialized with (excepting any new kwargs).

Parameters: **kwargs – Args to override when recreating this network. Commonly overridden args include ‘name’.
Returns
Return type: Network

property input_tensor_spec#: Return the input tensor spec BEFORE preprocessings have been applied.

property is_distribution_output#: Whether the output is Distribution.

property is_rnn#: Whether this network is a recurrent net.

make_parallel(n)[source]#

Make a parallelized version of this network.

A parallel network has n copies of network with the same structure but different independently initialized parameters.

By default, it creates NaiveParallelNetwork, which simply making n copies of this network and use a loop to call them in forward(). If possible, the subclass should override this to generate an optimized parallel implementation.

Parameters: n (int) – the number of copies
Returns: A parallel network
Return type: Network

property name#: Name of this Network.

property output_spec#: Return the spec of the network’s encoding output. By default, we use _test_forward to automatically compute the output and get its spec. For efficiency, subclasses can overwrite this function if the output spec can be obtained easily in other ways.

property saved_args#: Return the dictionary of the arguments used to construct the network.

singleton(singleton_instance=True)[source]#

Change the singleton property to the value given by the input argument singleton_instance. :param singleton_instance: a flag indicating whether to turn :type singleton_instance: bool :param the self._singleton_instance property on or off.: :param If self._singleton_instance is True, calling copy() will: :param return self; otherwise a re-created Network instance will be: :param returned.:

Returns: self, which facilitates cascaded calling.

property state_spec#

Return the state spec to be used by an Algorithm.

Subclass should override this to return the correct state_spec.

training: bool#

class NetworkWrapper(module, input_tensor_spec, state_spec=(), name='NetworkWrapper')[source]#

Bases: alf.networks.network.Network

Wrap module or function as a Network.

Parameters

module (Callable) – can be called as module(input) to calculate the output. If state_spec != (), then it’s called as module(input,state) and its return should be a tuple of (output,new_state).
input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec for the input of module
state_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec for the state of module
name (str) – name of the wrapped network

forward(x, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

make_parallel(n)[source]#

Make a parallelized version of this network.

A parallel network has n copies of network with the same structure but different independently initialized parameters.

Parameters: n (int) – the number of copies
Returns: A parallel network
Return type: Network

training: bool#

get_input_tensor_spec(net)[source]#

Get the input_tensor_spec of net if possible

Parameters

net (nn.Module) –

Returns

None if input_tensor_spec cannot be inferred: from net.

Return type

nested TensorSpec | None

wrap_as_network(net, input_tensor_spec)[source]#

Wrap net as a Network if it is not a Network.

Parameters

net (Network | Callable) –
input_tensor_spec (nested TensorSpec) – if net is not a Network, input_tensor_spec must be provided unless net is a FC. In that case, input_tensor_spec will be inferred from net.input_size if it is not provided.

Returns

Return type

Network

Raises

ValueError – if input_tensor_spec is None and cannot be inferred from net

alf.networks.networks#

Various concrete Networks.

class AMPWrapper(enabled, net)[source]#

Bases: alf.networks.network.Network

Wrap a network to run in a given AMP context.

Parameters

enabled (bool) – whether to enable AMP autocast
net (Network) – the wrapped network

Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of

the input.

state_spec (nested TensorSpec): the (nested) tensor spec of the state: of the network.

name (str):

forward(input, state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class Delay(input_tensor_spec, delay=1, name='Delay')[source]#

Bases: alf.networks.network.Network

The output is the input of the delay step ago.

Parameters

input_tensor_spec (nested TensorSpec) – representing the input
delay (int) – if 0, there is no delay and the output is same as the input.

Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of

the input.

state_spec (nested TensorSpec): the (nested) tensor spec of the state: of the network.

name (str):

forward(input, state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class GRUCell(input_size, hidden_size, name='GRUCell')[source]#

Bases: alf.networks.network.Network

A gated recurrent unit (GRU) cell

\[\begin{split}\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\end{split}\]

where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.

Parameters

input_size (int) – The number of expected features in the input x
hidden_size (int) – The number of features in the hidden state h

forward(input, state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class LSTMCell(input_size, hidden_size, name='LSTMCell')[source]#

Bases: alf.networks.network.Network

A long short-term memory (LSTM) cell.

\[\begin{split}\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\end{split}\]

where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product.

Parameters

input_size (int) – The number of expected features in the input x
hidden_size (int) – The number of features in the hidden state h

forward(input, state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class NoisyFC(input_size, output_size, std_init=0.5, new_noise_prob=0.01, activation=<function identity>, use_bn=False, use_ln=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm1d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0, bias_initializer=None, weight_opt_args=None, bias_opt_args=None)[source]#

Bases: alf.networks.network.Network

The Noisy Linear Layer discribed in

Fortunato et. al. Noisy Networks for Exploration

In short, the original weight \(w\) and bias \(b\) of FC layer are replaced with \(w + w_\sigma \odot \epislon^w\) and \(b + b_\sigma \odot \epsion^b\) where \(\epsilon^w\) and \(\epsilon^b\) are noise and \(w, w_\sigma, b, b_\sigma\) are trainable parameters.

Some details:

The noise for each sample in a batch is different.
The noise is maintained as state. It has a probability of new_noise_prob to change to new noise.
Since the initial state is always 0, a new noise will always be generated for zero state.
If it is running in eval mode (i.e., common.is_eval() is True), noise will be disabled (i.e. same as alf.layers.FC).
The noise is factorized Gaussian noise as described in the paper.

Parameters

input_size (int) – input size.
output_size (int) – output size.
activation (Callable) – activation function.
std_init (float) – the scaling factor for the initial value of weight_sigma and bias_sigma.
new_noise_prob (float) – the probability of resample the noise.
use_bn (bool) – whether use batch normalization.
use_ln (bool) – whether use layer normalization
bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.
kernel_initializer (Optional[Callable]) – initializer for the FC layer kernel. If none is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.
kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.
bias_init_value (float) – a constant for the initial bias value. This is ignored if bias_initializer is provided.
bias_initializer (Optional[Callable]) – initializer for the bias parameter.
weight_opt_args (Optional[Dict]) – If provided, it will be used as optimizer arguments for weight. And it will be combined with zero_mean=False and fixed_norm=False as optimizer arguments for weight_sigma.
bias_opt_args (Optional[Dict]) – If provided, it will be used as optimizer arguments for bias. And it will be combined with zero_mean=False as optimizer arguments for bias_sigma.

Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of

the input.

state_spec (nested TensorSpec): the (nested) tensor spec of the state: of the network.

name (str):

property bias#

forward(input, state)[source]#

Forward computation.

Parameters

inputs – its shape should be ``[batch_size, input_size]`
state (Tuple[Tensor]) – tuple of noise

Returns

with shape as [batch_size, output_size]

Return type

Tensor

property input_size#

property output_size#

reset_parameters()[source]#: Initialize the parameters.

training: bool#

property weight#

class Residue(block, input_tensor_spec=None, activation=<built-in method relu_ of type object>, name='Residue')[source]#

Bases: alf.networks.network.Network

Residue block.

It performs y = activation(x + block(x)).

Parameters

block (Callable) –
input_tensor_spec (nested TensorSpec) – input tensor spec for block if it cannot be infered from block
activation (Callable) – activation function

forward(x, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class TemporalPool(input_size, stack_size, pooling_size=1, dtype=torch.float32, mode='skip', name='TemporalPool')[source]#

Bases: alf.networks.network.Network

Pool features temporally.

Suppose input_size=(), stack_size=2, pooling_size=2, the following table shows the output of different mode for an input sequence of 1,2,3,4,5 (ignoring batch dimension)

1, 2, 3, 4, 5

skip: [0, 1], [0, 1], [1, 3], [1, 3], [3, 5] avg: [0, 0], [0, 1.5], [0, 1.5], [1.5, 3.5], [1.5, 3.5] max: [0, 0], [0, 2], [0, 2], [2, 4], [2, 4]

Note that for ‘avg’ and ‘max’, the result is zero for the first pooling_size - 1 steps because it needs pooling_size input to calculate the result. After that, the output changes every pooling_size steps as the new pooling result available. On the other hand, for ‘skip’, the first input is immediately reflected in the output because it is a valid way of skipping.

Example:

# A temporal CNN with progressively large temporal receptive field.
cnn = alf.networks.Sequential([
    alf.networks.TemporalPool(256, 3, 1),
    torch.nn.Flatten(),
    alf.layers.FC(768, 256, activation=torch.relu_),
    alf.networks.TemporalPool(256, 3, 2),
    torch.nn.Flatten(),
    alf.layers.FC(768, 256, activation=torch.relu_),
    alf.networks.TemporalPool(256, 3, 4),
    torch.nn.Flatten(),
    alf.layers.FC(768, 256, activation=torch.relu_)])

Note that the output of the above network changes every 4 steps, which may make the response too slow for many tasks. So a practical way of using TemporalPool is to combine it with Residue so that the output will not lag:

block = alf.networks.Residue(
    alf.networks.Sequential([
        alf.networks.TemporalPool(256, 3, 2),
        torch.nn.Flatten(),
        alf.layers.FC(768, 256, activation=torch.relu_)]))

Parameters

input_size (int|tuple[int]) – shape of the input
stack_size (int) – stack the features from so many steps
pooling_size (int) – if > 1, perform a pooling first. pooling_size steps of features will be pooled as single feature vector according to mode
mode (str) –
one of (‘skip’, ‘avg’, ‘max’), only effective if pooling_size > 1. ‘skip’: only keeping features at step t * pooling_size ‘avg’: features are averaged for each window of pooling_size steps.

The pooling results for first pooling_size - 1 steps are 0.

’max’: features are maxed for each window of pooling_size steps
The pooling results for first pooling_size - 1 steps are 0.

Returns

tensor of shape (stack_size, input_size)
internal states

Return type

tuple of

forward(x, state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

alf.networks.normalizing_flow_networks#

Different normalizing flow networks.

A normalizing flow network \(f: \mathbb{R}^N \rightarrow \mathbb{R}^N\)

is invertible, namely given any output \(y=f(x)\), we can easily compute the corresponding input \(x=f^{-1}(y)\), and
whose Jacobian determinant is easy to compute, for example, the product of diagonal elements.

class NormalizingFlowNetwork(input_tensor_spec, conditional_input_tensor_spec=None, use_transform_cache=True, name='NormalizingFlowNetwork')[source]#

Bases: alf.networks.network.Network

The base class for normalizing flow networks.

Compared to traditional Network classes, its subclass needs to implement the interface make_invertible_transform().

Parameters

input_tensor_spec (TensorSpec) – input tensor spec
conditional_input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – a nested tensor spec
use_transform_cache (bool) – whether to cache transforms. When there is a conditional input, different transforms might be created depending on the conditonal inputs. When there is no conditional input, the same transform will always be used. Note that this only caches the transform itself; to correctly cache the inverse result, you also have to set cache_size=1 when creating the transform.
name (str) – name of the network

forward(xz, state=())[source]#

When we have no conditional input for forward: y=self.forward(x). Otherwise y=self.forward((x,z)) where z is the conditional input.

Parameters

xz (Union[Tensor, Tuple[Tensor, Union[Tensor, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]]]) – the input can be either an unnested tensor x or a tuple of an unnested tensor and a nested tensor (x, z). z is an optional conditional input that conditions the normalizing flow mapping from x to y.
state (Union[Tensor, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – should be an empty tuple

inverse(yz, state=())[source]#

When we have no conditional input for forward: x=self.inverse(y). Otherwise x=self.inverse((y,z)) where z is the conditional input.

Parameters

yz (Union[Tensor, Tuple[Tensor, Union[Tensor, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]]]) – the input can be either an unnested tensor y or a tuple of an unnested tensor and a nested tensor (y, z). z is an optional conditional input that conditions the normalizing flow inverse mapping from y to x.
state (Union[Tensor, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – should be an empty tuple

make_invertible_transform(conditional_inputs=None)[source]#

Express the network forward computation as an invertible PyTorch Transform. This overall transformation can be a composed one chaining many transformation layers.

Parameters: conditional_inputs (Union[Tensor, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – an optional nested conditional inputs that condition the mapping \(x \rightarrow y\).
Return type: Transform
Returns: an invertible transform

training: bool#

property use_conditional_inputs: bool#

Return type

bool

Returns

Whether this normalizing flow uses inputs to condition the: transforms.

class RealNVPNetwork(input_tensor_spec, conditional_input_tensor_spec=None, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method tanh of type object>, transform_scale_nonlinear=functools.partial(<function clipped_exp>, clip_value_min=-10, clip_value_max=2), sub_dim=None, mask_mode='contiguous', num_layers=2, use_transform_cache=True, name='RealNVPNetwork')[source]#

Bases: alf.networks.normalizing_flow_networks.NormalizingFlowNetwork

Real-valued non-volume preserving transformations.

“DENSITY ESTIMATION USING REAL NVP”, Dinh et al., ICLR 2017.

In short, each transformation layer does

\[\begin{split}\begin{array}{rcl} y_{1:d} &=& x_{1:d}\\ y_{d+1:D} &=& x_{d+1:D}\bigodot \exp(s(x_{1:d};z)) + t(x_{1:d};z)\\ \end{array}\end{split}\]

where \(d\) is a hyperparameter that determines the two-way split of the input vector \(x\), \(D\) the total length of \(x\), \(s\) a (learned) scale function, and \(t\) a (learned) translation function. The scale and translation functions can depend on other input \(z\). It can be verified that the Jacobian is a lower-triangular matrix and its diagonal elements are \(\mathbb{I}_d\) and \(\text{diag}(\exp(s(x_{1:d};z)))\), regardless of how complex \(s\) and \(t\) are.

The original paper suggests to alternate the computations of \(y_{1:d}\) and \(y_{d+1:D}\) to avoid some part of \(x\) always getting copied.

Our implementation also allows specifying other binary masks. We additionally support a random binary mask and an evenly distributed mask. The reason is that we can always re-arrange the 0s and 1s and swap the rows of the Jacobian to make it triangular. Because we always take the absolute of Jacobian determinant, row swapping will not change the result of log_abs_det_jacobian().

Note that whichever binary mask is used, an alternating computation is always used. For example, let \(b\) be the mask, then

\[\begin{split}\begin{array}{rcl} y &=& b\bigodot x + (1-b)\bigodot(x\bigodot \exp(s(x\bigodot b;z)) + t(x\bigodot b;z))\\ \end{array}\end{split}\]

At even layers, we flip the values of \(b\).

For inverse computation,

\[\begin{split}\begin{array}{rcl} x &=& b\bigodot y + (1-b)\bigodot((y - t(y\bigodot b;z)) \div \exp(s(y\bigodot b;z)))\\ \end{array}\end{split}\]

Note

The scale and translation network’s initial output should be in a good range, so their hidden activations default to torch.tanh.

Parameters

input_tensor_spec (TensorSpec) – input tensor spec
conditional_input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – a nested tensor spec
input_preprocessors (Any) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. Only used when conditional inputs are present, where its structure should be (x_processor, z_processor).
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. Only used when conditional inputs are present.
conv_layer_params (Tuple[Tuple[int]]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional. Used by the scale and translation networks.
fc_layer_params (Tuple[int]) – a tuple of integers representing FC layer sizes of the scale and translation networks.
activation (Callable) – hidden activation of the scale and translation networks
transform_scale_nonlinear (Callable) – nonlinear function applied to the scale network output. Its codomain should be \([0,+\infty)\). Make sure that the value of this function won’t explode after several RealNVP transform layers.
sub_dim (int) – the dimensionality to keep unchanged at odd layers. If None, then half of the input is unchanged at a time. When it’s 0, all input dims will be changed by an affine transform independent of the input. This case can still be interesting because the affine transform could depend on other variables (i.e., conditional AffineTransform).
mask_mode (str) – three options are supported: “contiguous” (default), “distributed”, and “random”. “contiguous” means at odd layers, the first sub_dim elements are kept unchanged; “distributed” means that the sub_dim elements evenly distributed on the vector (good for vector with local similarity); “random” means that the mask is randomized.
num_layers (int) – number of transformation layers. Note that for mask mode of “random”, every two layers will have a different randomized mask.
use_transform_cache (bool) – whether use cached transform. Note that this only stores the transform itself; you also have to use cache_size=1 for the created transform to correctly cache the inverse result.
name (str) – name of the network

training: bool#

alf.networks.ou_process#

Ornstein-Uhlenbeck process.

class OUProcess(state_spec, damping=0.15, stddev=0.2)[source]#

Bases: alf.networks.network.Network

A zero-mean Ornstein-Uhlenbeck process.

A Class for generating noise from a zero-mean Ornstein-Uhlenbeck process.

The Ornstein-Uhlenbeck process is a process that generates temporally correlated noise via a random walk with damping. This process describes the velocity of a particle undergoing brownian motion in the presence of friction. This can be useful for exploration in continuous action environments with momentum.

The temporal update equation is: x_next = (1 - damping) * x + N(0, std_dev)

Parameters

state_spec (nested TensorSpec) – spec of the state
damping (float) – The rate at which the noise trajectory is damped towards the mean. We must have 0 <= damping <= 1, where a value of 0 gives an undamped random walk and a value of 1 gives uncorrelated Gaussian noise. Hence in most applications a small non-zero value is appropriate.
stddev (float) – Standard deviation of the Gaussian component.

forward(state)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

property state_spec#

Return the state spec to be used by an Algorithm.

Subclass should override this to return the correct state_spec.

training: bool#

alf.networks.param_networks#

Networks with input parameters.

class ParamConvNet(input_channels, input_size, conv_layer_params, same_padding=False, activation=<built-in method relu_ of type object>, use_bias=False, use_ln=False, n_groups=None, kernel_initializer=None, flatten_output=False, name='ParamConvNet')[source]#

Bases: alf.networks.network.Network

A fully 2D conv network that does not maintain its own network parameters, but accepts them from users. If the given parameter tensor has an extra batch dimension (first dimension), it performs parallel operations.

Parameters

input_channels (int) – number of channels in the input image
input_size (int or tuple) – the input image size (height, width)
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding, pooling_kernel), where padding and pooling_kernel are optional.
same_padding (bool) – similar to TF’s conv2d same padding mode. If True, the user provided paddings in conv_layer_params will be replaced by automatically calculated ones; if False, it corresponds to TF’s valid padding mode (the user can still provide custom paddings though)
activation (torch.nn.functional) – activation for all the layers
use_bias (bool) – whether use bias.
use_ln (bool) – whether use layer normalization
n_groups (int) – number of parallel groups, must be specified if use_ln
kernel_initializer (Callable) – initializer for all the layers.
flatten_output (bool) – If False, the output will be an image structure of shape (B, n, C, H, W); otherwise the output will be flattened into a feature of shape (B, n, C*H*W).
name (str) –

forward(inputs, state=())[source]#

Parameters

inputs (Tensor) –
state – not used, just keeps the interface same with other networks.

property param_length#: Get total number of parameters for all layers.

set_parameters(theta, reinitialize=False)[source]#

Distribute parameters to corresponding layers.

Parameters

theta (torch.Tensor) –

with shape [D] (groups=1)
or [B, D] (groups=B)

where the meaning of the symbols are: - B: batch size - D: length of parameters, should be self.param_length When the shape of inputs is [D], it will be unsqueezed to [1, D].
reinitialize (bool) – whether to reinitialize parameters of each layer.

training: bool#

class ParamNetwork(input_tensor_spec, conv_layer_params=None, fc_layer_params=None, use_conv_bias=False, use_conv_ln=False, use_fc_bias=True, use_fc_ln=False, n_groups=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, last_layer_size=None, last_activation=None, last_use_bias=True, last_use_ln=False, name='ParamNetwork')[source]#

Bases: alf.networks.network.Network

A network with Fc and conv2D layers that does not maintain its own network parameters, but accepts them from users. If the given parameter tensor has an extra batch dimension (first dimension), it performs parallel operations.

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input. If nested, then preprocessing_combiner must not be None.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding, pooling_kernel), where padding and pooling_kernel are optional.
fc_layer_params (tuple[int]) – a tuple of integers representing FC layer sizes.
use_conv_bias (bool) – whether use bias for conv layers.
use_conv_ln (bool) – whether use layer normalization for conv layers.
use_fc_bias (bool) – whether use bias for fc layers.
use_fc_ln (bool) – whether use layer normalization for fc layers.
n_groups (int) – number of parallel groups, must be specified if use_bn
activation (torch.nn.functional) – activation for all the layers
kernel_initializer (Callable) – initializer for all the layers.
last_layer_size (int) – an optional size of an additional layer appended at the very end. Note that if last_activation is specified, last_layer_size has to be specified explicitly.
last_activation (nn.functional) – activation function of the additional layer specified by last_layer_param. Note that if last_layer_param is not None, last_activation has to be specified explicitly.
last_use_bias (bool) – whether use bias for the additional layer.
last_use_fn (bool) – whether use layer normalization for the additional layer.
name (str) –

forward(inputs, state=())[source]#

Parameters

inputs (Tensor) –
state – not used, just keeps the interface same with other networks.

property param_length#: Get total number of parameters for all layers.

set_parameters(theta, reinitialize=False)[source]#

Distribute parameters to corresponding layers.

Parameters

theta (torch.Tensor) –

with shape [D] (groups=1)
or [B, D] (groups=B)

where the meaning of the symbols are: - B: batch size - D: length of parameters, should be self.param_length When the shape of inputs is [D], it will be unsqueezed to [1, D].
reinitialize (bool) – whether to reinitialize parameters of each layer.

training: bool#

alf.networks.preprocessor_networks#

PreprocessorNetworks.

class PreprocessorNetwork(input_tensor_spec, input_preprocessors=None, preprocessing_combiner=None, name='PreprocessorNetwork')[source]#

Bases: alf.networks.network.Network

A base class for networks with input processing need.

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input.
input_preprocessors (nested Network|nn.Module|None) – a nest of preprocessors, each of which will be applied to the corresponding input. If None, it is treated as math_ops.identity. If not None, input_tensor_spec must have the same structure with input_preprocessors upto the structure defined by input_preprocessors (see alf.nest.map_structure_upto), and each element of input_preproccessors will be applied to the corresponding subnest in input_tensor_spec. If any element is None, it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector. Note that only stateless networks are supported as input preprocessors by PreprocessorNetwork.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. It must be provided if the result from input_preprocessors is nested. This combiner must accept the result from input_preprocessors as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
name (str) – name of the network

forward(inputs, state=(), min_outer_rank=1, max_outer_rank=1)[source]#

Preprocessing nested inputs.

Parameters

inputs (nested Tensor) – inputs to the network
state (nested Tensor) – RNN state of the network
min_outer_rank (int) – the minimal outer rank allowed
max_outer_rank (int) – the maximal outer rank allowed

Returns

tensor after preprocessing.

Return type

Tensor

training: bool#

alf.networks.preprocessors#

This file contains input preprocessors as stateless Networks, used for the purpose of preprocessing input and making gin files more convenient to configure.

Example: In your gin file, below will be possible to configure: input1 (img) -> preprocessor1 -> embed1 —-> EncodingNetwork input2 (action) -> preprocessor2 -> embed2 / (with NestCombiner)

class EmbeddingPreprocessor(input_tensor_spec, embedding_dim, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, last_activation=<function identity>, name='EmbeddingPreproc')[source]#

Bases: alf.networks.network.Network

A preprocessor that converts the input to an embedding vector. This can be used when the input is a discrete scalar, or a continuous vector to be projected to a different dimension (to have the same length with other vectors). In the former case, torch.nn.Embedding is used without any activation. In the latter case, an EncodingNetwork is used with the specified network hyperparameters.

Parameters

input_tensor_spec (TensorSpec) – the input spec
embedding_dim (int) – output embedding size
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing FC layer sizes.
activation (torch.nn.functional) – activation of hidden layers if the input is a continuous vector.
last_activation (nn.functional) – activation function of the last layer specified by embedding_dim. math_ops.identity is used by default. Only used when the input is continuous.
name (str) –

forward(inputs, state=())[source]#

Preprocess either a tensor input or a TensorSpec.

Parameters

inputs (TensorSpec or Tensor) –

Returns

if Tensor, the returned is the preprocessed: result; otherwise it’s the tensor spec of the result.

Return type

Tensor or TensorSpec

training: bool#

alf.networks.projection_networks#

class BetaProjectionNetwork(input_size, action_spec, parallelism=None, activation=<built-in function softplus>, min_concentration=0.0, projection_output_init_gain=0.0, bias_init_value=0.541324854612918, grad_clip=0.01, name='BetaProjectionNetwork')[source]#

Bases: alf.networks.network.Network

Beta projection network.

Its output is a distribution with independent beta distribution for each action dimension. Since the support of beta distribution is [0, 1], we also apply an affine transformation so the support fill the range specified by action_spec.

Parameters

input_size (int) – input vector dimension
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
parallelism (Optional[int]) – when specified, this network will be parallelized. As a result, a batch dimension of parallelism will be appended to the batch shape of the output distribution, while the event shape remains the same. This is useful when you are creating a mixture of policies.
activation (Callable) – activation function to use in dense layers.
bias_init_value (float) – the default value is chosen so that, for softplus activation, the initial concentration will be close 1, which corresponds to uniform distribution.
grad_clip (float) – if provided, the L2-norm of the gradient of concentration will be clipped to be no more than grad_clip.
min_concentration (float) – there may be issue of numerical stability if the calculated concentration is very close to 0. A positive value of this may help to alleviate it.

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

make_parallel(n)[source]#

Make a parallelized version of this network.

A parallel network has n copies of network with the same structure but different independently initialized parameters.

Parameters: n (int) – the number of copies
Returns: A parallel network
Return type: Network

training: bool#

class CategoricalProjectionNetwork(input_size, action_spec, fc_ctor=<class 'alf.layers.FC'>, logits_init_output_factor=0.1, weight_opt_args=None, bias_opt_args=None, name='CategoricalProjectionNetwork')[source]#

Bases: alf.networks.network.Network

Creates a categorical projection network that outputs a discrete distribution over a number of classes.

Currently there seems no need for this class to handle nested inputs; If necessary, extend the argument list to support it in the future.

Parameters

input_size (int) – the input vector size
action_spec (BounedTensorSpec) – a tensor spec containing the information of the output distribution.
fc_ctor (Callable) – the constructor of FC layer. It is defaulted to alf.layers.FC. However, you can use different FC layers such as alf.nn.NoisyFC.
weight_opt_args – optimizer arguments for weight.
bias_opt_args – optimizer arguments for bias.
name (str) –

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

make_parallel(n)[source]#: Creates a ParallelCategoricalProjectionNetwork using n replicas of self. The initialized layer parameters will be different.

training: bool#

class CauchyProjectionNetwork(input_size, action_spec, squash_median=True, scale_bias_initializer_value=0.0, state_dependent_scale=False, scale_transform=<built-in function softplus>, scale_distribution=False, dist_squashing_transform=StableTanh(), name='CauchyProjectionNetwork')[source]#

Bases: alf.networks.projection_networks.NormalProjectionNetwork

Similar to NormalProjectionNetwork except that the output distribution is a DiagMultivariateCauchy. Also since Cauchy doesn’t have mean or std, we provide parameters for its median and scale instead. But the median and scale will just reuse the code for handling mean and std in NormalProjectionNetwork.

Parameters

input_size (int) – input vector dimension
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
squash_median (bool) – If True, squash the output median to fit the action spec. If scale_distribution is also True, this value will be ignored.
scale_bias_initializer_value (float) – Initial value for the bias of the scale projection layer.
state_dependent_scale (bool) – If True, scale will be generated depending on the current state; otherwise a global scale will be generated regardless of the current state.
scale_transform (Callable) – Transform to apply to the scale, on top of activation.
scale_distribution (bool) – Whether or not to scale the output distribution to ensure that the output aciton fits within the action_spec. Note that this is different from mean_transform which merely squashes the mean to fit within the spec.
dist_squashing_transform (td.Transform) – A distribution Transform which transform values to fall in (-1, 1). Default to dist_utils.StableTanh()
name (str) – name of this network.

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class MixtureProjectionNetwork(input_size, action_spec, num_components, component_ctor, mixture_ctor=<class 'alf.networks.projection_networks.CategoricalProjectionNetwork'>, name='mix_proj_net')[source]#

Bases: alf.networks.network.Network

A projection network that outputs MixtureSameFamily distributions.

The output distribution consists of 2 parts:

A categorical distribution for each of the component.
A components distribution of num_components replicas.

Constructs an instance of MixtureProjectionNetwork.

Parameters

input_size (int) – the input vector size
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
num_components (int) – the number of component distributions.
component_ctor (Callable[[int, TensorSpec], Network]) – constructor to a projection network that outputs distribution for all the components. The make_parallel method of the projection network will be called to make the actual projection network that has a replica of num_components.
mixture_ctor (Callable[[int, BoundedTensorSpec], Network]) – constructor to a projection network that outputs the mixture (categorical) distributions. The number of categories equals num_components.

forward(inputs, state={'components': (), 'mixture': ()})[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

property num_components: int#

Return type: int

training: bool#

class NormalProjectionNetwork(input_size, action_spec, parallelism=None, activation=<function identity>, projection_output_init_gain=0.3, std_bias_initializer_value=0.0, squash_mean=True, state_dependent_std=False, std_transform=<built-in function softplus>, scale_distribution=False, dist_squashing_transform=StableTanh(), name='NormalProjectionNetwork')[source]#

Bases: alf.networks.network.Network

Creates an instance of NormalProjectionNetwork.

Currently there seems no need for this class to handle nested inputs; If necessary, extend the argument list to support it in the future.

Parameters

input_size (int) – input vector dimension
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
parallelism (Optional[int]) – when specified, this network will be parallelized. As a result, a batch dimension of parallelism will be appended to the batch shape of the output distribution, while the event shape remains the same. This is useful when you are creating a mixture of policies.
activation (Callable) – activation function to use in dense layers.
projection_output_init_gain (float) – Output gain for initializing action means and std weights.
std_bias_initializer_value (float) – Initial value for the bias of the std_projection_layer.
squash_mean (bool) – If True, squash the output mean to fit the action spec. If scale_distribution is also True, this value will be ignored.
state_dependent_std (bool) – If True, std will be generated depending on the current state; otherwise a global std will be generated regardless of the current state.
std_transform (Callable) – Transform to apply to the std, on top of activation.
scale_distribution (bool) – Whether or not to scale the output distribution to ensure that the output aciton fits within the action_spec. Note that this is different from mean_transform which merely squashes the mean to fit within the spec.
dist_squashing_transform (td.Transform) – A distribution Transform which transforms values into \((-1, 1)\). Default to dist_utils.StableTanh()
name (str) – name of this network.

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

make_parallel(n)[source]#

Make a parallelized version of this network.

A parallel network has n copies of network with the same structure but different independently initialized parameters.

Parameters: n (int) – the number of copies
Returns: A parallel network
Return type: Network

training: bool#

class OnehotCategoricalProjectionNetwork(input_size, action_spec, logits_init_output_factor=0.1, mode='st', gumbel_temperature=1.0, name='OnehotCategoricalProjectionNetwork')[source]#

Bases: alf.networks.network.Network

Creates a onehot categorical projection network that outputs a discrete distribution over a number of classes.

An option to use the straight-through estimator is provided for this network, which is proposed by Bengio et al., “Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation”, 2013.

Parameters

input_size (int) – the input vector size
action_spec (BounedTensorSpec) – a tensor spec containing the information of the output distribution.
logits_init_output_factor (float) – the gain factor to initialize the FC layer for predicting the logits
mode (str) – one of (‘st’, ‘gumbel’, ‘st-gumbel’, ‘plain’). All modes other than ‘plain’ enables gradient backprop through the samples. ‘st’ uses the straight-through grad estimator; ‘gumbel’ uses the Gumbel-softmax distribution to sample soft onehot vectors; ‘st-gumbel’ additionally takes argmax on the soft vectors and applies the straight-through grad estimator. Generally, ‘st-gumbel’ should have a lower grad variance than ‘st’.
gumbel_temperature (float) – the temperature of the Gumbel-softmax distribution. Only used by ‘gumbel’ and ‘st-gumbel’ modes. A higher temperature leads to a more uniform sample (less like one-hot).
name (str) –

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class ParallelCategoricalProjectionNetwork(input_size, action_spec, n, fc_ctor=<class 'alf.layers.FC'>, logits_init_output_factor=0.1, name='ParallelCategoricalProjectionNetwork')[source]#

Bases: alf.networks.network.Network

Creates an instance of ParallelCategoricalProjectionNetwork.

Parameters

input_size (int) – input vector dimension
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
n (int) – number of the parallel networks
fc_ctor – must be alf.layers.FC
name (str) – name of this network.

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class StableNormalProjectionNetwork(input_size, action_spec, parallelism=None, activation=<function identity>, projection_output_init_gain=1e-05, squash_mean=True, state_dependent_std=False, inverse_std_transform='softplus', scale_distribution=False, init_std=1.0, min_std=0.0, max_std=None, dist_squashing_transform=StableTanh(), name='StableNormalProjectionNetwork')[source]#

Bases: alf.networks.projection_networks.NormalProjectionNetwork

Generates a Multi-variate normal by predicting a mean and std.

It parameterizes the normal distributions as \(\sigma=c_0+\frac{1}{c_1+softplus(b)}\) and \(\mu=a\cdot\sigma\) where a and b are outputs from means_projection_layer and stds_projectin_layer respectively. \(c_0\) and \(c_1\) are chosen so that \(\sigma_{min} <= \sigma <= \sigma_{max}\). The advantage of this parameterization is that its second order derivatives with respect to a and b are bounded even when the standard deviations become very small so that the optimization is more stable. See docs/stable_gradient_descent_for_gaussian_distribution.py for detail.

Creates an instance of StableNormalProjectionNetwork.

Currently there seems no need for this class to handle nested inputs; If necessary, extend the argument list to support it in the future.

Parameters

input_size (int) – input vector dimension
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
activation (Callable) – activation function to use in dense layers.
parallelism (Optional[int]) – when specified, this network will be parallelized. As a result, a batch dimension of parallelism will be appended to the batch shape of the output distribution, while the event shape remains the same. This is useful when you are creating a mixture of policies.
projection_output_init_gain (float) – Output gain for initializing action means and std weights.
squash_mean (bool) – If True, squash the output mean to fit the action spec. If scale_distribution is also True, this value will be ignored.
state_dependent_std (bool) – If True, std will be generated depending on the current state; otherwise a global std will be generated regardless of the current state.
inverse_std_transform (str) – Currently supports “exp” and “softplus”. Transformation to obtain inverse std. The transformed values are further transformed according to min_std and max_std.
scale_distribution (bool) – Whether or not to scale the output distribution to ensure that the output aciton fits within the action_spec. Note that this is different from ‘mean_transform’ which merely squashes the mean to fit within the spec.
init_std (float) – Initial value for standard deviation.
min_std (float) – Minimum value for standard deviation.
max_std (float) – Maximum value for standard deviation. If None, no maximum is enforced.
dist_squashing_transform (td.Transform) – A distribution Transform which transforms values into \((-1, 1)\). Default to dist_utils.StableTanh()
name (str) – name of this network.

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class TruncatedProjectionNetwork(input_size, action_spec, activation=<function identity>, projection_output_init_gain=0.3, scale_bias_initializer_value=0.0, state_dependent_scale=False, loc_transform=<built-in method tanh of type object>, scale_transform=<built-in function softplus>, min_scale=None, max_scale=None, dist_ctor=<class 'alf.utils.distributions.TruncatedNormal'>, name='TruncatedProjectionNetwork')[source]#

Bases: alf.networks.network.Network

Creates an instance of TruncatedProjectionNetwork.

Its output is a TruncatedDistribution with bounds given by the action bounds specified in action_spec.

Parameters

input_size (int) – input vector dimension
action_spec (TensorSpec) – a tensor spec containing the information of the output distribution.
activation (Callable) – activation function to use in dense layers.
projection_output_init_gain (float) – Output gain for initializing action means and std weights.
std_bias_initializer_value (float) – Initial value for the bias of the std_projection_layer.
state_dependent_scale (bool) – If True, std will be generated depending on the current state (i.e. inputs); otherwise a global scale will be generated regardless of the current state.
loc_transform (Callable) – Tranform to apply to the loc, on top of activation to make it within [-1, 1].
scale_transform (Callable) – Transform to apply to the std, on top of activation to make it positive.
min_scale (float) – Minimum value for scale. If None, no maximum is enforced.
max_scale (float) – Maximum value for scale. If None, no maximum is enforced.
dist_ctor (Callable) – constructor for the distribution called as: dist_ctor(loc=loc, scale=scale, lower_bound=lower_bound, upper_bound=upper_bound).
name (str) – name of this network.

forward(inputs, state=())[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

alf.networks.q_networks#

QNetworks

class ParallelQNetwork(q_network, n, name='ParallelQNetwork')[source]#

Bases: alf.networks.network.Network

Perform n Q-value computations in parallel.

Parameters

q_network (QNetwork) – non-parallelized q network
n (int) – make n replicas from q_network with different parameter initializations.
name (str) –

forward(inputs, state=())[source]#

Compute action values given an observation.

Parameters

inputs (nest) – consistent with input_tensor_spec.
state – empty for API consistent with QRNNNetwork.

Returns

action_value (Tensor): a tensor of shape \([B,n,k]\), where \(B\) is the batch size, \(n\) is the num of replicas, and \(k\) is the number of actions.
state: empty

Return type

tuple

property state_spec#: Return the state spec of the q network. It is simply the state spec of the encoding network.

training: bool#

class QNetwork(input_tensor_spec, action_spec, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_naive_parallel_network=False, name='QNetwork')[source]#

Bases: alf.networks.q_networks.QNetworkBase

Create an instance of QNetwork.

Creates an instance of QNetwork for estimating action-value of discrete actions. The action-value is defined as the expected return starting from the given input observation and taking the given action. It takes observation as input and outputs an action-value tensor with the shape of [batch_size, num_of_actions].

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input
action_spec (TensorSpec) – the tensor spec of the action
input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a default variance_scaling_initializer will be used.
use_naive_parallel_network (bool) – if True, will use NaiveParallelNetwork when make_parallel is called. This might be useful in cases when the NaiveParallelNetwork has an advantange in terms of speed over ParallelNetwork. You have to test to see which way is faster for your particular situation.

training: bool#

class QNetworkBase(input_tensor_spec, action_spec, encoding_network_ctor, use_naive_parallel_network=False, name='QNetworkBase', **encoder_kwargs)[source]#

Bases: alf.networks.network.Network

A base class for QNetwork and QRNNNetwork.

Can also be used to create customized value networks by providing different encoding network creators.

Parameters

input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the input
action_spec (BoundedTensorSpec) – the tensor spec of the action
encoding_network_ctor (Callable) – the creator of the encoding network that does the heavy lifting of the q network.
use_naive_parallel_network (bool) – if True, will use NaiveParallelNetwork when make_parallel is called. This might be useful in cases when the NaiveParallelNetwork has an advantange in terms of speed over ParallelNetwork. You have to test to see which way is faster for your particular situation.
name (str) – name of the network
encoder_kwargs – the extra keyword arguments to the encoding network

forward(observation, state=())[source]#

Computes action values given an observation.

Parameters

observation (nest) – consistent with input_tensor_spec
state – empty for API consistent with QRNNNetwork

Returns

action_value (torch.Tensor): a tensor of the size [batch_size, num_actions]
state: empty

Return type

tuple

make_parallel(n)[source]#: Create a ParallelQNetwork using n replicas of self. The initialized network parameters will be different. If use_naive_parallel_network is True, use NaiveParallelNetwork to create the parallel network.

property state_spec#: Return the state spec of the q network. It is simply the state spec of the encoding network.

training: bool#

class QRNNNetwork(input_tensor_spec, action_spec, input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, lstm_hidden_size=100, value_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_naive_parallel_network=False, name='QRNNNetwork')[source]#

Bases: alf.networks.q_networks.QNetworkBase

Create a RNN-based that outputs temporally correlated q-values.

Creates an instance of QRNNNetwork for estimating action-value of discrete actions. The action-value is defined as the expected return starting from the given inputs (observation and state) and taking the given action. It takes observation and state as input and outputs an action-value tensor with the shape of [batch_size, num_of_actions]. :type input_tensor_spec: TensorSpec :param input_tensor_spec: the tensor spec of the input :type input_tensor_spec: TensorSpec :type action_spec: BoundedTensorSpec :param action_spec: the tensor spec of the action :type action_spec: TensorSpec :param input_preprocessors: a nest of

input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.

Parameters

preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers for encoding the observation.
lstm_hidden_size (int or tuple[int]) – the hidden size(s) of the LSTM cell(s). Each size corresponds to a cell. If there are multiple sizes, then lstm cells are stacked.
value_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers that are applied after the lstm cell’s output.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a default variance_scaling_initializer will be used.
use_naive_parallel_network (bool) – if True, will use NaiveParallelNetwork when make_parallel is called. This might be useful in cases when the NaiveParallelNetwork has an advantange in terms of speed over ParallelNetwork. You have to test to see which way is faster for your particular situation.

training: bool#

alf.networks.relu_mlp#

class ReluMLP(input_tensor_spec, output_size=None, hidden_layers=(64, 64), name='ReluMLP')[source]#

Bases: alf.networks.network.Network

A MLP with relu activations. Diagonals of input-output Jacobian can be computed directly without calling autograd.

Create a ReluMLP.

Parameters

input_tensor_spec (TensorSpec) –
output_size (int) – output dimension.
hidden_layers (tuple) – size of hidden layers.
name (str) –

compute_jac(inputs, output_partial_idx=None)[source]#

Compute the input-output Jacobian, support partial output.

Parameters

inputs (Tensor) – size (self._input_size) or (B, self._input_size)
output_partial_idx (list) – list of output indices for taking partial output-input Jacobian. Default is None, where standard full output-input Jacobian will be used.

Returns

shape (out_size, in_size) or (B, out_size, in_size),: where out_size is self._output_size if output_partial_idx is None, len(output_partial_idx) otherwise.

Return type

Jacobian (Tensor)

compute_jac_diag(inputs)[source]#: Compute diagonals of the input-output Jacobian.

compute_jvp(inputs, vec, output_partial_idx=None)[source]#

Compute Jacobian-vector product, support partial output-input Jacobian.

Parameters

inputs (Tensor) – size (self._input_size) or (B, self._input_size)
vec (Tensor) – the vector for which the Jacobian-vector product is computed. Must be of size (self._input_size) or (B, self._input_size).
output_partial_idx (list) – list of output indices for taking partial output-input Jacobian. Default is None, where standard full output-input Jacobian will be used.

Returns

shape (out_size) or (B, out_size), where out_size: is self._output_size if output_partial_idx is None, len(output_partial_idx) otherwise.

outputs (Tensor): outputs of the ReluMLP

Return type

jvp (Tensor)

compute_vjp(inputs, vec, output_partial_idx=None)[source]#

Compute vector-Jacobian product, support partial output-input Jacobian.

Parameters

inputs (Tensor) – size (self._input_size) or (B, self._input_size)
vec (Tensor) – the vector for which the vector-Jacobian product is computed. Must be of size (self._output_size) or (B, self._output_size).
output_partial_idx (list) – list of output indices for taking partial output-input Jacobian. Default is None, where standard full output-input Jacobian will be used.

Returns

shape (self._input_size) or (B, self._input_size). outputs (Tensor): outputs of the ReluMLP

Return type

vjp (Tensor)

forward(inputs, state=(), requires_jac=False, requires_jac_diag=False)[source]#

Parameters

inputs (torch.Tensor) –
state – not used
requires_jac (bool) – whether outputs input-output Jacobian.
requires_jac_diag (bool) – whetheer outputs diagonals of Jacobian.

training: bool#

class SimpleFC(input_size, output_size, activation=<function identity>)[source]#

Bases: torch.nn.modules.linear.Linear

A simple FC layer that record its output before activation. It is for used in the ReluMLP to enable explicit computation of diagonals of input-output Jacobian.

Initialize a SimpleFC layer.

Parameters

input_size (int) – input dimension.
output_size (int) – output dimension.
activation (nn.functional) – activation used for this layer. Default is math_ops.identity.

forward(inputs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

property hidden_neurons#

in_features: int#

out_features: int#

weight: torch.Tensor#

alf.networks.transformer_networks#

class SocialAttentionNetwork(input_tensor_spec, input_preprocessors=None, preprocessing_combiner=None, fc_layer_params=(128, 128), activation=<built-in method relu_ of type object>, kernel_initializer=None, use_fc_bn=False, num_of_heads=1, last_layer_size=None, last_activation=None, last_kernel_initializer=None, name='SocialAttentionNetwork')[source]#

Bases: alf.networks.preprocessor_networks.PreprocessorNetwork

Simple graph encoding network, which takes as input a set of objects and outputs one encoded feature vector. Reference:

Leurent et al “Social Attention for Autonomous Decision-Making in Dense Traffic”, arXiv:1911.12250

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input. If nested, then preprocessing_combiner must not be None.
input_preprocessors (nested InputPreprocessor) – a nest of InputPreprocessor, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
fc_layer_params (tuple[int]) – a tuple of integers representing FC layer sizes for generating embeddings.
activation (nn.functional) – activation used for all the layers but the last layer.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If None, a variance_scaling_initializer will be used.
use_fc_bn (bool) – whether use Batch Normalization for fc layers.
num_of_heads (int) – number of heads for the mult-head attention
last_layer_size (None) – nt used; for interface compatibility
last_activation (None) – not used; for interface compatibility
last_kernel_initializer (None) – not used; for interface compatibility
last_use_fc_bn (None) – not used; for interface compatibility
name (str) –

forward(inputs, state=())[source]#

Parameters

inputs (Tensor) – with the shape of [B, N, d], where B denotes batch size, N the number of entities, and d the feature dimension
state (nested Tensor) – states

Returns

shape is [B, d’], where d’ denotes the output dimension of the last layer specified by fc_layer_params (i.e. fc_layer_params[-1])

Return type

Tensor

training: bool#

class TransformerNetwork(input_tensor_spec, num_prememory_layers, num_attention_heads, d_ff=None, core_size=1, use_core_embedding=True, memory_size=0, num_memory_layers=0, return_core_only=True, centralized_memory=True, input_preprocessors=None, name='TransformerNetwork')[source]#

Bases: alf.networks.preprocessor_networks.PreprocessorNetwork

A Network composed of Memory and TransformerBlock.

The following is the pseudocode for the computation:

for i in range(num_prememory_layers):
    core, inputs = T_i([core, inputs], [core, inputs])
for j in range(num_memory_layers):
    new_core, inputs = TM_j([memory_j, core, inputs], [core, inputs])
    memory_j.write(core)
    core = new_core
return core, new_memory_state

where T_i denotes the TransformerBlock for the i-th prememory layers and TM_j denotes the TransformerBlock for the j-th memory layers. memory_j is an FIFOMemory object (not to be confused with the memory argument of TransformerBlock.forward() function)

The core embedding serves the same purpose of [CLS] in the BERT model in [1], which is to generate a fixed dimensional representation for downstream tasks. Different from BERT, which only has one [CLS] embedding, we allow the option of having multiple core embeddings. In addition to generating a fixed dimensional representation, the core embedding is also used to update the memory.

[1]. Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for: Language Understanding

Parameters

input_tensor_spec (nested TensorSpec) – the (nested) tensor spec of the input. If input_tensor_spec is not nested, it should represent a rank-2 tensor of shape [input_size, d_model], where input_size is the length of the input sequence, and d_model is the dimension of embedding.
num_prememory_layers (int) – number of TransformerBlock calculation without using memory
num_attention_heads (int) – number of attention heads for each TransformerBlock
d_ff (int) – the size of the hidden layer of the feedforward network in each TransformerBlock. If None, TransformerBlock will calculate it as 4*d_model.
memory_size (int) – size of memory.
num_memory_layers (int) – number of TransformerBlock calculation using memory
return_core_only (bool) – If True, only return the core embedding. Otherwise, return all embeddings
core_size (int) – size of core (i.e. number of embeddings of core)
use_core_embedding (bool) – whether to use learnable core embedding. If True, will use additional learnable core embedding to augment the input. If False, the first core_size embeddings of the input are treated as core.
centralized_memory (bool) – if False, there will be a separate memory for each memory layers. if True, there will be a single memory for all the memroy layers and it is updated using the last core embeddings.
input_preprocessors (nested Network|nn.Module) – a nest of stateless preprocessor networks, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec. If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector. The output_spec of each input preprocessor i should be [input_size_i, d_model]. The result of all the preprocessors will be concatenated as a Tensor of shape [batch_size, input_size, d_model], where input_size = sum_i input_size_i.

forward(inputs, state=())[source]#

Parameters

inputs (nested Tensor) – consistent with input_tensor_spec provided at __init__()
state (nested Tensor) – states

Returns

shape is [B, core_size * d_model] if return_core_only,: and [B, core_size + input_size, d_model] if not return_core_only, where input_size is the number of embeddings from the (processed) input.

nested Tensor: network states.

Return type

Tensor

property state_spec#

Return the state spec to be used by an Algorithm.

Subclass should override this to return the correct state_spec.

training: bool#

alf.networks.value_networks#

ValueNetwork and ValueRNNNetwork.

class ParallelValueNetwork(value_network, n, name='ParallelValueNetwork')[source]#

Bases: alf.networks.network.Network

Perform n value computations in parallel.

It creates a parallelized version of value_network. :type value_network: ValueNetwork :param value_network: non-parallelized value network :type value_network: ValueNetwork :type n: int :param n: make n replicas from value_network with different

initialization.

Parameters: name (str) –

forward(observation, state=())[source]#: Computes values given a batch of observations. :param inputs: A tuple of Tensors consistent with input_tensor_spec`. :type inputs: tuple :param state: Empty for API consistent with ValueRNNNetwork. :type state: tuple

property state_spec#: Return the state spec of the value network. It is simply the state spec of the encoding network.

training: bool#

class ValueNetwork(input_tensor_spec, output_tensor_spec=TensorSpec(shape=(), dtype=torch.float32), input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, use_fc_bn=False, name='ValueNetwork')[source]#

Bases: alf.networks.value_networks.ValueNetworkBase

Output temporally uncorrelated values.

Creates a value network that estimates the expected return.

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input
output_tensor_spec (TensorSpec) – spec for the output
input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layer sizes.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a default xavier_uniform initializer will be used.
use_fc_bn (bool) – whether use Batch Normalization for the internal FC layers (i.e. FC layers beside the last one).
name (str) –

training: bool#

class ValueNetworkBase(input_tensor_spec, output_tensor_spec, encoding_network_ctor, name='ValueNetworkBase', **encoder_kwargs)[source]#

Bases: alf.networks.network.Network

A base class for ValueNetwork and ValueRNNNetwork.

Can also be used to create customized value networks by providing different encoding network creators.

Parameters

input_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – the tensor spec of the input.
output_tensor_spec (Union[TensorSpec, List[ForwardRef], Tuple[()], Tuple[ForwardRef, …], Dict[str, ForwardRef]]) – spec for the value output.
encoding_network_ctor (Callable) – the creator of the encoding network that does the heavy lifting of the value network.
name – name of the network
encoder_kwargs – the extra keyword arguments to the encoding network

forward(observation, state=())[source]#

Computes a value given an observation.

Parameters

observation (torch.Tensor) – consistent with input_tensor_spec
state – empty for API consistent with ValueRNNNetwork

Returns

a 1D tensor state: empty

Return type

value (torch.Tensor)

make_parallel(n)[source]#: Create a ParallelValueNetwork using n replicas of self. The initialized network parameters will be different.

property state_spec#: Return the state spec of the value network. It is simply the state spec of the encoding network.

training: bool#

class ValueRNNNetwork(input_tensor_spec, output_tensor_spec=TensorSpec(shape=(), dtype=torch.float32), input_preprocessors=None, preprocessing_combiner=None, conv_layer_params=None, fc_layer_params=None, lstm_hidden_size=100, value_fc_layer_params=None, activation=<built-in method relu_ of type object>, kernel_initializer=None, name='ValueRNNNetwork')[source]#

Bases: alf.networks.value_networks.ValueNetworkBase

Outputs temporally correlated values.

Creates an instance of ValueRNNNetwork.

Parameters

input_tensor_spec (TensorSpec) – the tensor spec of the input
output_tensor_spec (TensorSpec) – spec for the output
input_preprocessors (nested Network|nn.Module|None) – a nest of input preprocessors, each of which will be applied to the corresponding input. If not None, then it must have the same structure with input_tensor_spec (after reshaping). If any element is None, then it will be treated as math_ops.identity. This arg is helpful if you want to have separate preprocessings for different inputs by configuring a gin file without changing the code. For example, embedding a discrete input before concatenating it to another continuous vector.
preprocessing_combiner (NestCombiner) – preprocessing called on complex inputs. Note that this combiner must also accept input_tensor_spec as the input to compute the processed tensor spec. For example, see alf.nest.utils.NestConcat. This arg is helpful if you want to combine inputs by configuring a gin file without changing the code.
conv_layer_params (tuple[tuple]) – a tuple of tuples where each tuple takes a format (filters, kernel_size, strides, padding), where padding is optional.
fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers for encoding the observation.
lstm_hidden_size (int or tuple[int]) – the hidden size(s) of the LSTM cell(s). Each size corresponds to a cell. If there are multiple sizes, then lstm cells are stacked.
value_fc_layer_params (tuple[int]) – a tuple of integers representing hidden FC layers that are applied after the lstm cell’s output.
activation (nn.functional) – activation used for hidden layers. The last layer will not be activated.
kernel_initializer (Callable) – initializer for all the layers but the last layer. If none is provided a default xavier_uniform initializer will be used.
name (str) –

training: bool#