alf#

alf.config_helpers#

Helper functions for config alf training.

The main motivation is to give the access of observation_spec and action_spec, which are necessary for config some models. observation_spec and action_spec are only available after the environment is created. So we create an environment based TrainerConfig in this module.

close_env()[source]#

Close the global environment.

This function will be automatically called by RLTrainer.

get_action_spec()[source]#

Get the specs of the tensors expected by step(action) of the environment.

Note: you need to finish all the config for environments and TrainerConfig.random_seed before using this function.

Returns

a spec that describes the shape and dtype of each tensor expected by step().

Return type

nested TensorSpec

get_env()[source]#

Get the global training environment.

Note: you need to finish all the config for environments and TrainerConfig.random_seed before using this function.

Note: random seed will be initialized in this function.

Returns

AlfEnvironment

get_observation_spec(field=None)[source]#

Get the spec of observation transformed by data transformers.

The data transformers are specified by TrainerConfig.data_transformer_ctor.

Note

You need to finish all the config for environments and TrainerConfig.data_transformer_ctor before using this function.

Parameters

field (str) – a multi-step path denoted by “A.B.C”.

Returns

a spec that describes the observation.

Return type

nested TensorSpec

get_raw_observation_spec(field=None)[source]#

Get the TensorSpec of observations provided by the global environment.

Note

This function can only be called after all gym wrappers and TrainerConfig.random_seed have been configured. Otherwise the created environment might have unexpected behaviors.

Parameters

field (str) – a multi-step path denoted by “A.B.C”.

Returns

a spec that describes the observation.

Return type

nested TensorSpec

get_reward_spec()[source]#

Get the spec of the reward returned by the environment.

Note: you need to finish all the config for environments and TrainerConfig.random_seed before using this function.

Returns

a spec that describes the shape and dtype of reward.

Return type

TensorSpec

parse_config(conf_file, conf_params)[source]#

Parse config file and config parameters

Note: a global environment will be created (which can be obtained by alf.get_env()) and random seed will be initialized by this function using common.set_random_seed().

Parameters
  • conf_file (str) – The full path of the config file.

  • conf_params (list[str]) – the list of config parameters. Each one has a format of CONFIG_NAME=VALUE.

alf.config_util#

Alf configuration utilities.

config(prefix_or_dict, mutable=True, raise_if_used=True, **kwargs)[source]#

Set the values for the configs with given name as suffix.

Example:

Assume we have the following decorated functions and classes:

@alf.configurable
def cool_func(param1, cool_arg1='a default value', cool_arg2=3):
    ...

@alf.configurable
def dumb_func(param1, a=1, b=2):
    ...

@alf.configurable
class Worker(obj):
    def __init__(self, job1=1, job2=2):
        ...

    @alf.configurable
    def func(self, a, b):
        ...

We can config in the following ways:

alf.config('cool_func', cool_arg1='new_value', cool_arg2='another_value')
alf.config('Worker.func', b=3)
alf.config('func', b=3)     # 'Worker.func' can be uniquely identified by 'func'
alf.config({
    'dumb_func.b': 3,
    'Worker.job1': 2        # now the default value of job1 for Worker() becomes 2.
})
Parameters
  • prefix_or_dict (str|dict) – if a dict, each (key, value) pair in it specifies the value for a config with name key. If a str, it is used as prefix so that each (key, value) pair in kwargs specifies the value for config with name prefix + '.' + key

  • mutable (bool) – whether the config can be changed later. If the user tries to change an existing immutable config, the change will be ignored and a warning will be generated. You can always change a mutable config. ValueError will be raised if trying to set a new immutable value to an existing immutable value.

  • raise_if_used (bool) – If True, ValueError will be raised if trying to config a value which has already been used.

  • **kwargs – only used if prefix_or_dict is a str.

config1(config_name, value, mutable=True, raise_if_used=True)[source]#

Set one configurable value.

Parameters
  • config_name (str) – name of the config

  • value (any) – value of the config

  • mutable (bool) – whether the config can be changed later. If the user tries to change an existing immutable config, the change will be ignored and a warning will be generated. You can always change a mutable config. ValueError will be raised if trying to set a new immutable value to an existing immutable value.

  • raise_if_used (bool) – If True, ValueError will be raised if trying to config a value which has already been used.

configurable(fn_or_name=None, whitelist=[], blacklist=[])[source]#

Decorator to make a function or class configurable.

This decorator registers the decorated function/class as configurable, which allows its parameters to be supplied from the global configuration (i.e., set through alf.config()). The decorated function is associated with a name in the global configuration, which by default is simply the name of the function or class, but can be specified explicitly to avoid naming collisions or improve clarity.

If some parameters should not be configurable, they can be specified in blacklist. If only a restricted set of parameters should be configurable, they can be specified in whitelist.

The decorator can be used without any parameters as follows:

In this case, the function is associated with the name ‘my_function’ in the global configuration, and both param1 and param2 are configurable.

The decorator can be supplied with parameters to specify the configurable name or supply a whitelist/blacklist:

In this case, the configurable is associated with the name ‘my_func’ in the global configuration, and only param2 is configurable.

Classes can be decorated as well, in which case parameters of their constructors are made configurable:

@alf.configurable
class MyClass(object):
    def __init__(self, param1, param2='a default value'):
        ...

In this case, the name of the configurable is ‘MyClass’, and both param1 and param2 are configurable.

The full name of a configurable value is MODULE_PATH.FUNC_NAME.ARG_NAME. It can be referred using any suffixes as long as there is no ambiguity. For example, assuming there are two configurable values “abc.def.func.a” and “xyz.uvw.func.a”, you can use “abc.def.func.a”, “def.func.a”, “xyz.uvw.func.a” or “uvw.func.a” to refer these two configurable values. You cannot use “func.a” because of the ambiguity. Because of this, you cannot have a config name which is the strict suffix of another config name. For example, “A.Test.arg” and “Test.arg” cannot both be defined. You can supply a different name for the function to avoid conflict:

@alf.configurable("NewTest")
def Test(arg):
    ...

or

@alf.configurable("B.Test")
def Test(arg):
    ...

Note: currently, to maintain the compatibility with gin-config, all the functions decorated using alf.configurable are automatically configurable using gin. The values specified using alf.config() will override values specified through gin. Gin wrapper is quite convoluted and can make debugging more challenging. It can be disabled by setting environment varialbe ALF_USE_GIN to 0 if you are not using gin.

Parameters
  • fn_or_name (Callable|str) – A name for this configurable, or a function to decorate (in which case the name will be taken from that function). If not set, defaults to the name of the function/class that is being made configurable. If a name is provided, it may also include module components to be used for disambiguation. If the module components is provided, the original module name of the function will not be used to compose the full name.

  • whitelist (list[str]) – A whitelisted set of kwargs that should be configurable. All other kwargs will not be configurable. Only one of whitelist or blacklist should be specified.

  • blacklist (list[str]) – A blacklisted set of kwargs that should not be configurable. All other kwargs will be configurable. Only one of whitelist or blacklist should be specified.

Returns

decorated function if fn_or_name is Callable. a decorator if fn is not Callable.

Raises

ValueError – If a configurable with name (or the name of fn_or_cls) already exists, or if both a whitelist and blacklist are specified.

define_config(name, default_value)[source]#

Define a configurable value with given default_value.

Its value can be retrieved by get_config_value().

Parameters
  • name (str) – name of the configurable value

  • default_value (Any) – default value

get_all_config_names()[source]#

Get the names of all configurable values.

get_config_value(config_name)[source]#

Get the value of the config with the name config_name.

Parameters

config_name (str) – name of the config or its suffix which can uniquely identify the config.

Returns

value of the config

Return type

Any

Raises

ValueError – if the value of the config has not been configured and it does not have a default value.

get_handled_pre_configs()[source]#

Return a list of handled pre-config (name, value).

get_inoperative_configs()[source]#

Get all the configs that have not been used.

A config is inoperative if its value has been set through alf.config() but its set value has never been used by any function calls.

Returns

list[tuple[config_name, Any]]

get_operative_configs()[source]#

Get all the configs that have been used.

A config is operative if a function call does not explicitly specify the value of that config and hence its default value or the value provided through alf.config() needs to be used.

Returns

list[tuple[config_name, Any]]

import_config(conf_file)[source]#

Import the config from another file.

Different from load_config(), import_config() should only be used in config files. And it can be used multiple times inside your config files.

If conf_file is a relative path, load_config() will first try to find it in the directory of the config file calling this function. If it cannot be found there, directories in the environment varianble ALF_CONFIG_PATH will be searched in order.

Examples:

1. Suppose you have a config file ~/code/my_conf.py. You want to import another config file ~/code/my_conf2.py. You can use import_config("my_conf2.py") to import my_config2.py.

2. Suppose you have a config file ~/code/my_conf.py. You want to import another config file ~/code/base/my_conf2.py. You can use import_config("base/my_conf2.py") to import my_config2.py.

3. Suppose you have a config file ~/code/my_conf.py. You want to import another config file ~/packages/my_conf2.py. You need to set the environment variable as ALF_CONFIG_PATH=~/packages. Then can use import_config("my_conf2.py") to import my_config2.py.

Parameters

conf_file

Returns

the config module object, which can be used in a similar way as python imported module.

load_config(conf_file)[source]#

Load config from a file.

Different from import_config(), load_config() should only be used by your main code to load the config. And it should be only called once unless

reset_configs() is called to reset the configuration to default state.

If conf_file is a relative path, load_config() will first try to find it in the current working directory. If it cannot be found there, directories in the environment varianble ALF_CONFIG_PATH will be searched in order.

Parameters

conf_file

Returns

the config module object, which can be used in a similar way as python imported module.

pre_config(configs)[source]#

Preset the values for configs before the module defining it is imported.

This function is useful for handling the config params from commandline, where there are no module imports and hence no config has been defined.

The value is bound to the config when the module defining the config is imported later. ``validate_pre_configs()` should be called after the config file has been loaded to ensure that all the pre_configs have been correctly bound.

Parameters

configs (dict) – dictionary of config name to value

repr_wrapper(cls)[source]#

A wrapper for automatically generating readable repr for an object.

The presentation shows the arguments used to construct of object. It does not include the default arguments, nor the class members.

To use it, simply use it to decorate an class.

Example:

@repr_wrapper
class MyClass(object):
    def __init__(self, a, b, c=100, d=200):
        pass

a = MyClass(1, 2)
assert repr(a) == "MyClass(1, 2)"
a = MyClass(3, 5, d=300)
assert repr(a) == "MyClass(1, 2, d=300)"
reset_configs()[source]#

Reset all the configs to their initial states.

save_config(alf_config_file)[source]#

Save config files.

This will save config set using pre_config(), the file loaded using load_config() and the files imported using import_config() if they are in the config root directory or its sub-directory, where the config root directory is the directory of the conf file loaded by load_config().

validate_pre_configs()[source]#

Validate that all the configs set through pre_config() are correctly bound.

alf.data_structures#

Various data structures. Converted to PyTorch from the TF version.

class AlgStep(output, state, info)#

Bases: tuple

Create new instance of AlgStep(output, state, info)

info#

Alias for field number 2

output#

Alias for field number 0

state#

Alias for field number 1

class BasicRLInfo(action)#

Bases: tuple

Create new instance of BasicRLInfo(action,)

action#

Alias for field number 0

class BasicRolloutInfo(rl, rewards, repr)#

Bases: tuple

Create new instance of BasicRolloutInfo(rl, rewards, repr)

repr#

Alias for field number 2

rewards#

Alias for field number 1

rl#

Alias for field number 0

class Experience(time_step=(), action=(), rollout_info=(), state=(), batch_info=(), replay_buffer=(), rollout_info_field=())[source]#

Bases: alf.data_structures.Experience

An Experience is a TimeStep in the context of training an RL algorithm. For the training purpose, it contains the following attributes:

  • time_step (TimeStep): A TimeStep structure contains the data emitted

    by an environment at each step of interaction.

  • action: A (nested) Tensor for action taken for the current time step.

  • rollout_info: AlgStep.info from rollout_step().

  • state: State passed to rollout_step() to generate action.

  • batch_info: Its type is alf.experience_replays.replay_buffer.BatchInfo.

    This is only used when experiece is passed as an argument for Algorithm.calc_loss(). Different from other members, the shape of the tensors in batch_info is [B], where B is the batch size.

  • replay_buffer: The replay buffer where the batch_info generated from.

    Currently, this field is available when experience is passed to Algorithm.calc_loss(), Algorithm.preprocess_experience() or DataTransformer.transform_experience()

  • rollout_info_field: The name of the rollout_info field in replay buffer.

    This is useful when an algorithm needs to access its rollout_info in the replay buffer.

Create new instance of Experience(time_step, action, rollout_info, state, batch_info, replay_buffer, rollout_info_field)

property discount#
property env_id#
get_time_step_field(field)[source]#

Get the value of the experience.time_step specified by field. Since we have exposed the common time_step fields as properties of Experience, this function can be used when the field if not covered by the exposed properties. :param field: indicate the field to be retrieved in time_step. :type field: str

Returns

The value of the field in time_step corresponding to field.

is_first()[source]#
is_last()[source]#
is_mid()[source]#
property observation#
property prev_action#
property reward#
property step_type#
update_time_step_field(field, new_value)[source]#

Update the value of the experience.time_step specified by field. :param field: indicate the field to be updated :type field: str :param new_value: the new value for the field :type new_value: any

Returns

a structure the same as the original experience except that the field field in the time_step is replaced by new_value.

Return type

Experience

class LossInfo(loss, scalar_loss, extra, priority, gns, batch_label)#

Bases: tuple

Create new instance of LossInfo(loss, scalar_loss, extra, priority, gns, batch_label)

batch_label#

Alias for field number 5

extra#

Alias for field number 2

gns#

Alias for field number 4

loss#

Alias for field number 0

priority#

Alias for field number 3

scalar_loss#

Alias for field number 1

class StepType(value)[source]#

Bases: object

Defines the status of a TimeStep within a sequence.

Add ability to create StepType constants from a value.

FIRST = 0#
LAST = 2#
MID = 1#
class TimeStep(step_type=(), reward=(), discount=(), observation=(), prev_action=(), env_id=(), untransformed=(), env_info=())[source]#

Bases: alf.data_structures.TimeStep

A TimeStep contains the data emitted by an environment at each step of interaction. A TimeStep holds a step_type, an observation (typically a NumPy array or a dict or list of arrays), and an associated reward and discount.

The first TimeStep in a sequence will equal StepType.FIRST. The final TimeStep will equal StepType.LAST. All other TimeStep``s in a sequence will equal to ``StepType.MID.

It has eight attributes:

  • step_type: a Tensor or numpy int of StepType enum values.

  • reward: a Tensor of reward values from executing ‘prev_action’.

  • discount: A discount value in the range \([0, 1]\).

  • observation: A (nested) Tensor for observation.

  • prev_action: A (nested) Tensor for action from previous time step.

  • env_id: A scalar Tensor of the environment ID of the time step.

  • untransformed: a nest that represents the entire time step itself before any transformation (e.g., observation or reward transformation); used for experience replay observing by subalgorithms.

  • env_info: A dictionary containing information returned by Gym environments’ info.

Create new instance of TimeStep(step_type, reward, discount, observation, prev_action, env_id, untransformed, env_info)

cpu()[source]#

Get the cpu version of this data structure.

cuda()[source]#

Get the cuda version of this data structure.

is_first()[source]#
is_last()[source]#
is_mid()[source]#
add_batch_info(experience, batch_info, buffer=())[source]#

Add batch_info and rollout_info_field string to experience.

clear_batch_info(experience)[source]#

Clear batch_info and rollout_info_field string from experience.

Useful as certain nest functions like convert_device do not skip non-tensor objects in nests.

elastic_namedtuple(name, args)[source]#

elastic namedtuple that returns () for a non-existing attribute, instead of throwing out an AttributeError.

Parameters
  • name (str) – type name of this elastic namedtuple.

  • args – other arguments for constructing the namedtuple

Returns

the type for the elastic namedtuple

make_experience(time_step, alg_step, state)[source]#

Make an instance of Experience from TimeStep and AlgStep.

Parameters
  • time_step (TimeStep) – time step from the environment.

  • alg_step (AlgStep) – policy step returned from rollout().

  • state (nested Tensor) – state used for calling rollout() to get the policy_step.

Returns

Return type

Experience

namedtuple(typename, field_names, default_value=None, default_values=())[source]#

namedtuple with default value.

Parameters
  • typename (str) – type name of this namedtuple.

  • field_names (list[str]) – name of each field.

  • default_value (Any) – the default value for all fields.

  • default_values (list|dict) – default value for each field.

Returns

the type for the namedtuple

restart(observation, action_spec, reward_spec=TensorSpec(shape=(), dtype=torch.float32), env_id=None, env_info={}, batched=False)[source]#

Returns a TimeStep with step_type set equal to StepType.FIRST.

Called by env.reset().

Parameters
  • observation (nested tensors) – observations of the env.

  • action_spec (nested TensorSpec) – tensor spec of actions.

  • reward_spec (TensorSpec) – a rank-1 or rank-0 (default) tensor spec

  • env_id (batched or scalar torch.int32) – (optional) ID of the env.

  • env_info (dict) – extra info returned by the environment.

  • batched (bool) – (optional) whether batched envs or not.

Returns

Return type

TimeStep

termination(observation, prev_action, reward, reward_spec=TensorSpec(shape=(), dtype=torch.float32), env_id=None, env_info={})[source]#

Returns a TimeStep with step_type set to StepType.LAST.

Called by env.step() if ‘Done’. discount should not be sent in and will be set as 0.

Parameters
  • observation (nested tensors) – current observations of the env.

  • prev_action (nested tensors) – previous actions to the the env.

  • reward (float) – A scalar, or 1D NumPy array, or tensor.

  • reward_spec (TensorSpec) – a rank-1 or rank-0 (default) tensor spec. Used to tell if the termination is batched or not.

  • env_id (torch.int32) – (optional) A scalar or 1D tensor of the environment ID(s).

  • env_info (dict) – extra info returned by the environment.

Returns

Return type

TimeStep

Raises

ValueError – If observations are tensors but reward’s statically known rank is not 0 or 1.

time_step_spec(observation_spec, action_spec, reward_spec)[source]#

Returns a TimeStep spec given the observation_spec and the action_spec.

transition(observation, prev_action, reward, reward_spec=TensorSpec(shape=(), dtype=torch.float32), discount=1.0, env_id=None, env_info={})[source]#

Returns a TimeStep with step_type set equal to StepType.MID.

Called by env.step() if not ‘Done’.

The batch size is inferred from the shape of reward.

If discount is a scalar, and observation contains tensors, then discount will be broadcasted to match reward.shape.

Parameters
  • observation (nested tensors) – current observations of the env.

  • prev_action (nested tensors) – previous actions to the the env.

  • reward (float) – A scalar, or 1D NumPy array, or tensor.

  • reward_spec (TensorSpec) – a rank-1 or rank-0 (default) tensor spec. Used to tell if the transition is batched or not.

  • discount (float) – (optional) A scalar, or 1D NumPy array, or tensor.

  • env_id (torch.int32) – (optional) A scalar or 1D tensor of the environment ID(s).

  • env_info (dict) – extra info returned by the environment.

Returns

Return type

TimeStep

Raises
  • ValueError – If observations are tensors but reward’s rank

  • is not 0 or 1.

alf.device_ctx#

class device(device_name)[source]#

Bases: object

Specifies the device for tensors created in this context.

Create the context with default device with name device_name

Parameters

device_name (str) – one of (“cpu”, “cuda”)

get_default_device()[source]#
set_default_device(device_name)[source]#

Set the default device.

Cannot find a native torch function for setting default device. We have to hack our own.

Parameters

device_name (str) – one of (“cpu”, “cuda”)

alf.initializers#

variance_scaling_init(tensor, gain=1.0, mode='fan_in', distribution='truncated_normal', calc_gain_after_activation=True, nonlinearity=<function identity>, transposed=False)[source]#

Implements TensorFlow’s VarianceScaling initializer.

https://github.com/tensorflow/tensorflow/blob/e5bf8de410005de06a7ff5393fafdf832ef1d4ad/tensorflow/python/ops/init_ops.py#L437

A potential benefit of this intializer is that we can sample from a truncated normal distribution: scipy.stats.truncnorm(a=-2, b=2, loc=0., scale=1.).

Also incorporates PyTorch’s calculation of the recommended gains that taking nonlinear activations into account, so that after N layers, the final output std (in linear space) will be a constant regardless of N’s value (when N is large). This auto gain probably won’t make much of a difference if the network is shallow, as in most RL cases.

Example usage:

from alf.networks.initializers import variance_scaling_init
layer = nn.Linear(2, 2)
variance_scaling_init(layer.weight.data,
                      nonlinearity=nn.functional.leaky_relu)
nn.init.zeros_(layer.bias.data)
Parameters
  • tensor (torch.Tensor) – the weights to be initialized

  • gain (float) – a positive scaling factor for weight std. Different from tf’s implementation, this number is applied outside of math.sqrt. Note that if calc_gain_after_activation=True, this number will be an additional gain factor on top of that.

  • mode (str) – one of “fan_in”, “fan_out”, and “fan_avg”

  • distribution (str) – one of “uniform”, “untruncated_normal” and “truncated_normal”. If the latter, the weights will be sampled from a normal distribution truncated at (-2, 2).

  • calc_gain_after_activation (bool) – whether automatically calculate the std gain of applying nonlinearity after this layer. A nonlinear activation (e.g., relu) might change std after the transformation, so we need to compensate for that. Only used when mode==”fan_in”.

  • nonlinearity (Callable) – any callable activation function

  • transposed (bool) – a flag indicating if the weight tensor has been tranposed (e.g., nn.ConvTranspose2d). In that case, fan_in and fan_out should be swapped.

Returns

a randomly initialized weight tensor

Return type

torch.Tensor

alf.layers#

Some basic layers.

class AMPWrapper(enabled, net)[source]#

Bases: torch.nn.modules.module.Module

Wrap a layer to run in a given AMP context.

Parameters
  • enabled (bool) – whether to enable AMP autocast

  • net (Module) – the wrapped network

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class AddN[source]#

Bases: alf.layers.ElementwiseLayerBase

Add several tensors

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#
Parameters

input (Iterable[Tensor]) – a sequence of tensors to be summed

Returns

the sum of all the tensors

Return type

Tensor

training: bool#
class BottleneckBlock(in_channels, kernel_size, filters, stride, transpose=False, v1_5=True, with_batch_normalization=True, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]#

Bases: torch.nn.modules.module.Module

Bottleneck block for ResNet.

We allow two slightly different architectures:

TODO:

  1. ResNet-D in Bag of Tricks for Image Classification with Convolutional Neural Networks Note: v1_5 is the ResNet-B in the above paper.

  2. Squeeze-and-Excitation (SE) in Squeeze-and-Excitation Networks SE is also shown to be useful in Revisiting ResNets: Improved Training and Scaling Strategies

Parameters
  • kernel_size (int) – the kernel size of middle layer at main path

  • filters (int) – the filters of 3 layer at main path

  • stride (int) – stride for this block.

  • transpose (bool) – a bool indicate using Conv2D or Conv2DTranspose. If two BottleneckBlock layers L and LT are constructed with the same arguments except transpose, it is guaranteed that LT(L(x)).shape == x.shape if x.shape[-2:] can be divided by stride.

  • v1_5 (bool) – whether to use the ResNet V1.5 structure

  • with_batch_normalization (bool) – whether to include batch normalization. Note that standard ResNet uses batch normalization.

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

calc_output_shape(input_shape)[source]#
forward(inputs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class Branch(*modules, **named_modules)[source]#

Bases: torch.nn.modules.module.Module

Apply multiple modules on the same input.

Example:

net = Branch((module1, module2))
y = net(x)

is equivalent to the following:

y = module1(x), module2(x)
Parameters
  • modules (nested nn.Module) – a nest of torch.nn.Module. Note that Branch(module_a, module_b) is equivalent to Branch((module_a, module_b))

  • named_modules (nn.Module | Callable) – a simpler way of specifying a dict of modules. Branch(a=model_a, b=module_b) is equivalent to Branch(dict(a=module_a, b=module_b))

forward(inputs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#

Create a parallelized version of this network.

Parameters

n (int) – the number of copies

Returns

the parallelized version of this network

reset_parameters()[source]#
training: bool#
class Cast(dtype=torch.float32)[source]#

Bases: alf.layers.ElementwiseLayerBase

A layer that cast the dtype of the elements of the input tensor.

Parameters

dtype (torch.dtype) – desired type of the new tensor.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class CausalConv1D(in_channels, out_channels, kernel_size, dilation=1, hide_current=False, activation=<built-in method relu_ of type object>, use_bias=None, use_bn=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

1D (Dilated) Causal Convolution layer. 1D Dilated Causal Convolution is proposed in Aaron et al. WaveNet: A generative model for raw audio

A layer implementing the 1D (Dilated) Causal Convolution. It is also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the causal conv1d layer.

Note that the main difference of causal conv v.s. standard conv is that each temporal element in the convolutional output is causal w.r.t. the temporal elements from input. For example, for a length L sequence x with the shape of [B, C, L], and y = causal_conv(x), where the shape of y is [B, C', L], by causal we mean y[..., l] only depends on X[..., :l] (i.e. the past), and there is no dependency on X[..., l:] (i.e. future) as in the standard non-causal convolution.

This can implemented by using an asymmetric padding, which in effect shift the input to the right (future) according to kernel size.

Parameters
  • in_channels (int) – channels of the input

  • out_channels (int) – channels of the output

  • kernel_size (int) – size of the kernel

  • dilation (int) – controls the spacing between the kernel points. Please refer to here for a visual illustration: https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md

  • hide_current (bool) – whether to hide the current by shifting the input to the right (future) by one. This is typically needed in the first layer of a causal conv net.

  • activation (torch.nn.functional) – activation to be applied to output

  • use_bias (bool|None) – whether use bias. If None, will use not use_bn

  • use_bn (bool) – whether use batch normalization

  • kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#
forward(x)[source]#
Parameters

x (tensor) – input of the shape [B, C, L] where B is the batch size, C denotes the number of input channels, and L is the length of the signal.

Returns

A tensor of the shape [B, C’, L], where C’ denotes the number of

output channels.

reset_parameters()[source]#

Initialize the parameters.

training: bool#
property weight#
class CompositionalFC(input_size, output_size, n, activation=<function identity>, output_comp_weight=True, use_bias=True, use_bn=False, use_ln=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

Compositional FC layer.

It maintains a set of n FC parameters for learning. During forward computation, it composes the set of parameters using weighted average with the compositional weight provided as input and then performs the FC computation, which is equivalent to combine the pre-activation output from each of the n FC layers using the compositional weight, and then apply normalization and activation.

Parameters
  • input_size (int) – input size

  • output_size (int) – output size

  • n (int) – the size of the paramster set

  • activation (torch.nn.functional) –

  • output_comp_weight (bool) – If True, the forward() function will return a tuple of (result, comp_weight) for easy chaining of multiple layers in the case when the same compsitional weight is used. If False, the forward() function will return result only.

  • use_bias (bool) – whether use bias

  • use_bn (bool) – whether use Batch Normalization.

  • use_ln (bool) – whether use layer normalization

  • kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#

Get the bias Tensor.

Returns

with shape (n, output_size). bias[i] is the bias for the

i-th FC layer. bias[i] can be used for FC layer with the same input_size and output_size

Return type

Tensor

forward(inputs)[source]#

Forward

Parameters
  • inputs (torch.Tensor|tuple) – If a Tensor, its shape should be

  • input_size] If a tuple, it should contain two elements. ([B,) –

  • first is a Tensor with the shape of [B, input_size], the (The) –

  • is a compositional weight Tensor with the shape of [B, n] (second) –

  • None. If the compositional weight is not specified (i.e. when (or) –

  • is not a tuple) or None, a uniform weight of one wil be used. (inputs) –

Returns

torch.Tensor representing the final activation with shape [B, output_size] if output_comp_weight is False. Otherwise, return a tuple consisted of the final activation and the compositional weight used.

reset_parameters()[source]#

Initialize the parameters.

training: bool#
property weight#

Get the weight Tensor.

Returns

with shape (n, output_size, input_size). weight[i] is

the weight for the i-th FC layer. weight[i] can be used for FC layer with the same input_size and output_size

Return type

Tensor

class Conv2D(in_channels, out_channels, kernel_size, activation=<built-in method relu_ of type object>, strides=1, padding=0, use_bias=None, use_bn=False, use_ln=False, weight_opt_args=None, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

2D Convolution Layer.

A 2D Conv layer that’s also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the conv layer. Suggest using this wrapper module instead of nn.Conv2d if you really care about weight std after init.

Parameters
  • in_channels (int) – channels of the input image

  • out_channels (int) – channels of the output image

  • kernel_size (int or tuple) –

  • activation (torch.nn.functional) –

  • strides (int or tuple) –

  • padding (int or tuple) –

  • use_bias (bool|None) – whether use bias. If None, will use not use_bn

  • use_bn (bool) – whether use batch normalization

  • use_ln (bool) – whether use layer normalization

  • weight_opt_args (Optional[Dict]) – optimizer arguments for weight (not for bias)

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

  • kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#
forward(img)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#
reset_parameters()[source]#

Initialize the parameters.

training: bool#
property weight#
class Conv2DBatchEnsemble(in_channels, out_channels, kernel_size, ensemble_size, output_ensemble_ids=True, activation=<built-in method relu_ of type object>, strides=1, padding=0, use_bias=None, use_bn=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_range=0.0, ensemble_group=0)[source]#

Bases: alf.layers.Conv2D

The BatchEnsemble for 2D Conv layer.

BatchEnsemble is proposed in Wen et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

In a nutshell, a tuple of vector \((r_k, s_k)\) is maintained for ensemble member k in addition to the conv2d kernel W of shape [C_out, C_in, K_h, K_w]. For input x of shape [B, C, H, W], the result for ensemble member k is calculated as \((W \circ (s_k r_k^T).unsqueeze(-1).unsqueeze(-1)) * x\). This can be more efficiently calculated as

\((W*(x \circ r_k.unsqueeze(-1).unsqueeze(-1))) \circ s_k.unsqueeze(-1).unsqueeze(-1)\)

Note that for each sample in a batch, a random ensemble member will used for it if ensemble_ids is not provided to forward().

Parameters
  • in_channels (int) – channels of the input image

  • out_channels (int) – channels of the output image

  • kernel_size (int or tuple) –

  • ensemble_size (int) – ensemble size

  • output_ensemble_ids (bool) – If True, the forward() function will return a tuple of (result, ensemble_ids). If False, the forward() function will return result only.

  • activation (torch.nn.functional) –

  • strides (int or tuple) –

  • padding (int or tuple) –

  • use_bias (bool|None) – whether use bias. If None, will use not use_bn

  • use_bn (bool) – whether use batch normalization

  • kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_range (float) – biases are initialized uniformly in [-bias_init_range, bias_init_range]

  • ensemble_group (int) –

    the extra attribute ensemble_group added to self._r, self._s, and self._ensemble_bias, default value is 0. For alf.optimizers whose parvi is not None, all parameters with the same ensemble_group will be updated by the particle-based VI algorithm specified by parvi, options are [svgd, gfsf],

    • Stein Variational Gradient Descent (SVGD)

      Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” NIPS. 2016.

    • Wasserstein Gradient Flow with Smoothed Functions (GFSF)

      Liu, Chang, et al. “Understanding and accelerating particle-based variational inference.” ICML, 2019.

forward(inputs)[source]#

Forward computation.

Parameters

inputs (Tensor|tuple) – if a Tensor, its shape should be [B, C, H, W]. And a random ensemble id will be generated for each sample in the batch. If a tuple, it should contain two tensors. The first one is the data tensor with shape [B, C, H, W]. The second one is ensemble_ids indicating which ensemble member each sample should use. Its shape should be [batch_size], and all elements should be in [0, ensemble_size).

Returns

tuple if output_ensemble_ids is True, - Tensor: with shape [B, C_out, H_out, W_out] - LongTensor: if enseble_ids is provided, this is same as ensemble_ids,

otherwise a randomly generated ensemble_ids is returned

Tensor if output_ensemble_ids is False. The result of Conv2D.

reset_parameters()[source]#

Reinitialize the parameters.

training: bool#
class ConvTranspose2D(in_channels, out_channels, kernel_size, activation=<built-in method relu_ of type object>, strides=1, padding=0, output_padding=0, use_bias=None, use_bn=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

A 2D ConvTranspose layer that’s also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the conv layer. Suggest using this wrapper module instead of nn.ConvTranspose2d if you really care about weight std after init.

Parameters
  • in_channels (int) – channels of the input image

  • out_channels (int) – channels of the output image

  • kernel_size (int or tuple) –

  • activation (torch.nn.functional) –

  • strides (int or tuple) –

  • padding (int or tuple) –

  • output_padding (int or tuple) – Additional size added to one side of each dimension in the output shape. Default: 0. See pytorch documentation for more detail.

  • use_bias (bool|None) – If None, will use not use_bn

  • use_bn (bool) – whether use batch normalization

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

  • kernel_initializer (Callable) – initializer for the conv_trans layer. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#
forward(img)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#
reset_parameters()[source]#

Initialize the parameters.

training: bool#
property weight#
class Detach[source]#

Bases: alf.layers.ElementwiseLayerBase

Detach nested Tensors.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class ElementwiseLayerBase[source]#

Bases: torch.nn.modules.module.Module

Base class for the layers of parameterless elementwise operations.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

make_parallel(n)[source]#

Create a layer with same operation to handle parallel batch.

It is assumed that a parallel batch has shape [B, n, …].

Parameters

n (int) – the number of replicas.

Returns

a layer with same operation to handle parallel batch.

training: bool#
class FC(input_size, output_size, activation=<function identity>, use_bias=True, use_bn=False, use_ln=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm1d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0, bias_initializer=None, weight_opt_args=None, bias_opt_args=None)[source]#

Bases: torch.nn.modules.module.Module

Fully connected layer.

A fully connected layer that’s also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the linear layer. Suggest using this wrapper module instead of nn.Linear if you really care about weight std after init.

Parameters
  • input_size (int) – input size

  • output_size (int) – output size

  • activation (torch.nn.functional) –

  • use_bias (bool) – whether use bias

  • use_bn (bool) – whether use batch normalization.

  • use_ln (bool) – whether use layer normalization

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

  • kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant for the initial bias value. This is ignored if bias_initializer is provided.

  • bias_initializer (Callable) – initializer for the bias parameter.

  • weight_opt_args (Optional[Dict]) – optimizer arguments for weight

  • bias_opt_args (Optional[Dict]) – optimizer arguments for bias

property bias#
forward(inputs)[source]#

Forward computation.

Parameters

inputs (Tensor) – its shape should be [batch_size, input_size] or [batch_size, ..., input_size]

Returns

with shape as inputs.shape[:-1] + (output_size,)

Return type

Tensor

property input_size#
make_parallel(n)[source]#

Create a ParallelFC using n replicas of self. The initialized layer parameters will be different.

property output_size#
reset_parameters()[source]#

Initialize the parameters.

training: bool#
property weight#
class FCBatchEnsemble(input_size, output_size, ensemble_size, output_ensemble_ids=True, activation=<function identity>, use_bias=True, use_bn=False, use_ln=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_range=0.0, ensemble_group=0)[source]#

Bases: alf.layers.FC

The BatchEnsemble for FC layer.

BatchEnsemble is proposed in Wen et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

In a nutshell, a tuple of vector \((r_k, s_k)\) is maintained for ensemble member k in addition to the original FC weight matrix w. For input x, the result for ensemble member k is calculated as \((W \circ (s_k r_k^T)) x\). This can be more efficiently calculated as \((W (x \circ r_k)) \circ s_k\). Note that for each sample in a batch, a random ensemble member will used for it if ensemble_ids is not provided to forward().

Parameters
  • input_size (int) – input size

  • output_size (int) – output size

  • ensemble_size (int) – ensemble size

  • output_ensemble_ids (bool) – If True, the forward() function will return a tuple of (result, ensemble_ids). If False, the forward() function will return result only.

  • activation (Callable) – activation function

  • use_bias (bool) – whether use bias

  • use_bn (bool) – whether use batch normalization.

  • use_ln (bool) – whether use layer normalization

  • kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_range (float) – biases are initialized uniformly in [-bias_init_range, bias_init_range]

  • ensemble_group (int) –

    the extra attribute ensemble_group added to self._r, self._s, and self._ensemble_bias, default value is 0. For alf.optimizers whose parvi is not None, all parameters with the same ensemble_group will be updated by the particle-based VI algorithm specified by parvi, options are [svgd, gfsf],

    • Stein Variational Gradient Descent (SVGD)

      Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” NIPS. 2016.

    • Wasserstein Gradient Flow with Smoothed Functions (GFSF)

      Liu, Chang, et al. “Understanding and accelerating particle-based variational inference.” ICML, 2019.

forward(inputs)[source]#

Forward computation.

Parameters

inputs (Tensor|tuple) – if a Tensor, its shape should be [batch_size, input_size] or [batch_size, ..., input_size]. And a random ensemble id will be generated for each sample in the batch. If a tuple, it should contain two tensors. The first one is the data tensor with shape [batch_size, input_size] or [batch_size, ..., input_size]. The second one is ensemble_ids indicating which ensemble member each sample should use. Its shape should be [batch_size], and all elements should be in [0, ensemble_size).

Returns

tuple if output_ensemble_ids is True, - Tensor: with shape as inputs.shape[:-1] + (output_size,) - LongTensor: if enseble_ids is provided, this is same as ensemble_ids,

otherwise a randomly generated ensemble_ids is returned

Tensor if output_ensemble_ids is False. The result of FC.

reset_parameters()[source]#

Reinitialize parameters.

training: bool#
class FixedDecodingLayer(input_size, output_size, basis_type='rbf', sigma=1.0, tau=0.5)[source]#

Bases: torch.nn.modules.module.Module

A layer that uses a set of fixed basis for decoding the inputs.

Parameters
  • input_size (int) – the size of input to be decoded, representing the number of representation coefficients

  • output_size (int) – the size of the decoded output

  • basis_type (str) – the type of basis to be used for decoding - “poly”: polynomial basis using Vandermonde matrix - “cheb”: polynomial basis using Chebyshev polynomials - “rbf”: radial basis functions - “haar”: Haar wavelet basis

  • sigma (float) – the bandwidth parameter used for RBF basis. If None, a default value of 1. will be used.

  • tau (float) – a factor for weighting the basis exponentially according to the order (n) of the basis, i.e., tau**n`

forward(inputs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
property weight#
class GFT(num_transformations, image_channels, language_dim)[source]#

Bases: torch.nn.modules.module.Module

Guided Feature Transformation.

This class implements the GFT model proposed in the following paper:

Yu et al. Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents, CoRL 2018

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#
Parameters

input (tuple) – the tuple of image features and sentence embedding.

Returns

same shape as input[0]

Return type

Tensor

reset_parameters()[source]#
training: bool#
class GetFields(field_nest=None, **fields)[source]#

Bases: alf.layers.ElementwiseLayerBase

Get the fields from a nested input.

Args
field_nest (nested str): the path of the fields to be retrieved. Each str

in fields represents a path to the field with ‘.’ separating the field name at different level.

fields (str): A simpler way of specifying field_nest when it is

a dict. GetFields(a="field_a", b="field_b") is equivalent to GetFields(dict(a="field_a", b="field_b")).

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class Identity[source]#

Bases: alf.layers.ElementwiseLayerBase

A layer that simply returns its argument as result.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class Lambda(func)[source]#

Bases: torch.nn.modules.module.Module

Wrap a function as an nn.Module.

Parameters

func (Callable) – a function that calculate the output given the input. It should be parameterless.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class NaiveParallelLayer(module, n)[source]#

Bases: torch.nn.modules.module.Module

A parallel network has n copies of network with the same structure but different indepently initialized parameters.

NaiveParallelLayer creates n independent networks with the same structure as network and evaluate them separately in a loop during forward().

Parameters
  • module (nn.Module | Callable) – the parallel network will have n` copies of ``module.

  • n (int) – n copies of module

forward(inputs)[source]#

Compute the output.

Parameters

inputs (nested torch.Tensor) – its shape is [B, n, ...]

Returns

its shape is [B, n, ...]

Return type

output (nested torch.Tensor)

reset_parameters()[source]#
training: bool#
class OneHot(num_classes)[source]#

Bases: torch.nn.modules.module.Module

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#
training: bool#
class ParallelConv2D(in_channels, out_channels, kernel_size, n, activation=<built-in method relu_ of type object>, strides=1, padding=0, use_bias=None, use_bn=False, use_ln=False, weight_opt_args=None, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

A parallel 2D Conv layer that can be used to perform n independent 2D convolutions in parallel.

It is equivalent to n separate Conv2D layers with the same in_channels and out_channels.

Parameters
  • in_channels (int) – channels of the input image

  • out_channels (int) – channels of the output image

  • kernel_size (int or tuple) –

  • n (int) – n independent Conv2D layers

  • activation (torch.nn.functional) –

  • strides (int or tuple) –

  • padding (int or tuple) –

  • use_bias (bool|None) – whether use bias. If None, will use not use_bn

  • use_bn (bool) – whether use batch normalization

  • use_ln (bool) – whether use layer normalization

  • weight_opt_args (Optional[Dict]) – optimizer arguments for weight (not for bias)

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

  • kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#
forward(img)[source]#

Forward

Parameters

img (torch.Tensor) –

with shape [B, C, H, W]

or [B, n, C, H, W]

where the meaning of the symbols are: - B: batch size - n: number of replicas - C: number of channels - H: image height - W: image width. When the shape of img is [B, C, H, W], all the n 2D Conv operations will take img as the same shared input. When the shape of img is [B, n, C, H, W], each 2D Conv operator will have its own input data by slicing img.

Returns

torch.Tensor with shape [B, n, C', H', W']

where the meaning of the symbols are: - B: batch - n: number of replicas - C': number of output channels - H': output height - W': output width

reset_parameters()[source]#
training: bool#
property weight#
class ParallelConvTranspose2D(in_channels, out_channels, kernel_size, n, activation=<built-in method relu_ of type object>, strides=1, padding=0, output_padding=0, use_bias=None, use_bn=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

A parallel ConvTranspose2D layer that can be used to perform n independent 2D transposed convolutions in parallel.

Parameters
  • in_channels (int) – channels of the input image

  • out_channels (int) – channels of the output image

  • kernel_size (int or tuple) –

  • n (int) – n independent ConvTranspose2D layers

  • activation (torch.nn.functional) –

  • strides (int or tuple) –

  • padding (int or tuple) –

  • output_padding (int or tuple) – Additional size added to one side of each dimension in the output shape. Default: 0. See pytorch documentation for more detail.

  • use_bias (bool|None) – If None, will use not use_bn

  • use_bn (bool) –

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

  • kernel_initializer (Callable) – initializer for the conv_trans layer. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#
forward(img)[source]#

Forward

Parameters

img (torch.Tensor) –

with shape [B, C, H, W]

or [B, n, C, H, W]

where the meaning of the symbols are: - B: batch size - n: number of replicas - C: number of channels - H: image height - W: image width. When the shape of img is [B, C, H, W], all the n transposed 2D Conv operations will take img as the same shared input. When the shape of img is [B, n, C, H, W], each transposed 2D Conv operator will have its own input data by slicing img.

Returns

torch.Tensor with shape [B, n, C', H', W']

where the meaning of the symbols are: - B: batch - n: number of replicas - C': number of output channels - H': output height - W': output width

training: bool#
property weight#
class ParallelFC(input_size, output_size, n, activation=<function identity>, use_bias=True, use_bn=False, use_ln=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm1d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0, bias_initializer=None, weight_opt_args=None, bias_opt_args=None)[source]#

Bases: torch.nn.modules.module.Module

Parallel FC layer.

It is equivalent to n separate FC layers with the same input_size and output_size.

Parameters
  • input_size (int) – input size

  • output_size (int) – output size

  • n (int) – n independent FC layers

  • activation (torch.nn.functional) –

  • use_bn (bool) – whether use Batch Normalization.

  • use_ln (bool) – whether use layer normalization

  • bn_ctor (Callable) – will be called as bn_ctor(num_features) to create the BN layer.

  • use_bias (bool) – whether use bias

  • kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant for the initial bias value. This is ignored if bias_initializer is provided.

  • bias_initializer (Callable) – initializer for the bias parameter.

  • weight_opt_args (Optional[Dict]) – optimizer arguments for weight

  • bias_opt_args (Optional[Dict]) – optimizer arguments for bias

property bias#

Get the bias Tensor.

Returns

with shape (n, output_size). bias[i] is the bias for the

i-th FC layer. bias[i] can be used for FC layer with the same input_size and output_size

Return type

Tensor

forward(inputs)[source]#

Forward

Parameters

inputs (torch.Tensor) – with shape [B, n, input_size] or [B, input_size]

Returns

torch.Tensor with shape [B, n, output_size]

reset_parameters()[source]#
training: bool#
property weight#

Get the weight Tensor.

Returns

with shape (n, output_size, input_size). weight[i] is

the weight for the i-th FC layer. weight[i] can be used for FC layer with the same input_size and output_size

Return type

Tensor

class ParamConv2D(in_channels, out_channels, kernel_size, activation=<built-in method relu_ of type object>, strides=1, pooling_kernel=None, padding=0, use_bias=False, use_ln=False, n_groups=None, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

A 2D conv layer that does not maintain its own weight and bias, but accepts both from users. If the given parameter (weight and bias) tensor has an extra batch dimension (first dimension), it performs parallel FC operation.

Parameters
  • in_channels (int) – channels of the input image

  • out_channels (int) – channels of the output image

  • kernel_size (int or tuple) –

  • activation (torch.nn.functional) –

  • strides (int or tuple) –

  • pooling_kernel (int or tuple) –

  • padding (int or tuple) –

  • use_bias (bool) – whether use bias.

  • use_ln (bool) – whether use layer normalization

  • n_groups (int) – number of parallel groups, it is determined by the first dimension of the input parameters when calling set_parameters if use_ln is False. If use_ln is True, n_groups must be specified at initialization and will be fixed, all input parameters will have to be consistent with it.

  • kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#

Get stored bias tensor or batch of bias tensors.

property bias_length#

Get the n_element of a single bias tensor.

forward(img, keep_group_dim=True)[source]#

Forward

Parameters

img (torch.Tensor) –

with shape [B, C, H, W] (groups=1)

or [B, n, C, H, W] (groups=n)

where the meaning of the symbols are: - B: batch size - n: number of replicas - C: number of channels - H: image height - W: image width. When the shape of img is [B, C, H, W], all the n 2D Conv operations will take img as the same shared input. When the shape of img is [B, n, C, H, W], each 2D Conv operator will have its own input data by slicing img.

Returns

torch.Tensor with shape [B, n, C', H', W'] if keep_group_dim otherwise with shape [B, n*C', H', W'],

where the meaning of the symbols are: - B: batch - n: number of replicas - C': number of output channels - H': output height - W': output width

property param_length#

Get total number of parameters for all layers.

set_parameters(theta, reinitialize=False)[source]#

Distribute parameters to corresponding parameters.

Parameters
  • theta (torch.Tensor) –

    with shape [D] (groups=1)

    or [B, D] (groups=B)

    where the meaning of the symbols are: - B: batch size - D: length of parameters, should be self.param_length When the shape of inputs is [D], it will be unsqueezed to [1, D].

  • reinitialize (bool) – whether to reinitialize parameters of each layer.

training: bool#
property weight#

Get stored weight tensor or batch of weight tensors.

property weight_length#

Get the n_element of a single weight tensor.

class ParamFC(input_size, output_size, activation=<built-in method relu_ of type object>, use_bias=True, use_ln=False, n_groups=None, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#

Bases: torch.nn.modules.module.Module

A fully connected layer that does not maintain its own weight and bias, but accepts both from users. If the given parameter (weight and bias) tensor has an extra batch dimension (first dimension), it performs parallel FC operation.

Parameters
  • input_size (int) – input size

  • output_size (int) – output size

  • activation (torch.nn.functional) –

  • use_bias (bool) – whether use bias

  • use_ln (bool) – whether use layer normalization

  • n_groups (int) – number of parallel groups, it is determined by the first dimension of the input parameters when calling set_parameters if use_ln is False. If use_ln is True, n_groups must be specified at initialization and will be fixed, all input parameters will have to be consistent with it.

  • kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a variance_scaling_initializer with gain as kernel_init_gain will be used.

  • kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if kernel_initializer is not None.

  • bias_init_value (float) – a constant

property bias#

Get stored bias tensor or batch of bias tensors.

property bias_length#

Get the n_element of a single bias tensor.

forward(inputs)[source]#

Forward

Parameters

inputs (torch.Tensor) –

with shape [B, D] (groups=1) or [B, n, D] (groups=n) where the meaning of the symbols are:

  • B: batch size

  • n: number of replicas

  • D: input dimension

When the shape of inputs is [B, D], all the n linear operations will take inputs as the same shared inputs. When the shape of inputs is [B, n, D], each linear operator will have its own input data by slicing inputs.

Returns

with shape [B, n, D] or [B, D]

where the meaning of the symbols are:

  • B: batch

  • n: number of replicas

  • D: output dimension

Return type

torch.Tensor

property param_length#

Get total number of parameters for all layers.

set_parameters(theta, reinitialize=False)[source]#

Distribute parameters to corresponding parameters. :param theta:

with shape [D] (groups=1)

or [B, D] (groups=B)

where the meaning of the symbols are: - B: batch size - D: length of parameters, should be self.param_length When the shape of inputs is [D], it will be unsqueezed to [1, D].

Parameters

reinitialize (bool) – whether to reinitialize parameters of each layer.

training: bool#
property weight#

Get stored weight tensor or batch of weight tensors.

property weight_length#

Get the n_element of a single weight tensor.

class Permute(*dims)[source]#

Bases: torch.nn.modules.module.Module

A layer that perform the permutation of channels.

Parameters

*dims – The desired ordering of dimensions (not including batch dimension)

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#

Create a Permute layer to handle parallel batch.

It is assumed that a parallel batch has shape [B, n, …] and both the batch dimension and replica dimension are not considered for permute.

Parameters

n (int) – the number of replicas.

Returns

a Permute layer to handle parallel batch.

training: bool#
class RandomCrop(size, padding=0)[source]#

Bases: torch.nn.modules.module.Module

Perform random crop independently for each image in the batch.

Note that torchvision.transforms.RandomCrop is different in that it applies the same random crop for all the images in the batch.

Each result image is a random crop of the padded input image. The padded pixels are from the neareat pixel from the boundary.

Parameters
  • size (Union[int, Tuple[int]]) – a tuple of desired height and width. If is int, uses the same height and width.

  • padding (Union[int, Tuple[int]]) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#
Parameters

input (Tensor) – shape is [B, C, H, W]

Return type

Tensor

Returns

a tensor of shape [B, C, h, w], where h, w=size

training: bool#
class ReplicationPad2d(padding)[source]#

Bases: torch.nn.modules.module.Module

Pad the input tensor using replication of the input boundary.

For N-dimensional padding, use torch.nn.functional.pad().

This is same as torch.nn.ReplicationPad2d except that this implementation can handle input of any dtype, while torch.nn one can only handle float dtype.

Parameters

padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\))

Shape:
  • Input: \((N, C, H_{in}, W_{in})\)

  • Output: \((N, C, H_{out}, W_{out})\) where

    \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)

    \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class Reshape(*shape)[source]#

Bases: torch.nn.modules.module.Module

A layer for reshape the tensor.

The result of this layer is a tensor reshaped to (B, *shape) where B is x.shape[0]

Parameters

shape (tuple of ints|int...) – desired shape not including the batch dimension.

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#
training: bool#
class ResidueBlock(in_channels, channels, kernel_size, stride, transpose=False, activation=ReLU(inplace=True), with_batch_normalization=True, weight_opt_args=None, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]#

Bases: torch.nn.modules.module.Module

The ResidueBlock for ResNet.

This is the residual block used in ResNet-18 and ResNet-34 of the original ResNet paper Deep residual learning for image recognition.

Compared to BottleneckBlock, it has one less conv layer.

Parameters
  • in_channels (int) – the number of channels of input

  • kernel_size (Union[int, Tuple[int, int]]) – the kernel size of middle layer at main path

  • filters – the number of filters of the two conv layers at main path

  • stride (Union[int, Tuple[int, int]]) – stride for this block.

  • transpose (bool) – whether use Conv2D or Conv2DTranspose. If two ResidueBlock layers L and LT are constructed with the same arguments except transpose, it is guaranteed that LT(L(x)).shape == x.shape if x.shape[-2:] can be divided by stride.

  • activation (Module) – activation function.

  • with_batch_normalization (bool) – whether to include batch normalization. Note that standard ResNet uses batch normalization.

  • weight_opt_args (Optional[Dict]) – optimizer arguments for weights (not for bias)

  • bn_ctor (Callable[[int], Module]) – will be called as bn_ctor(num_features) to create the BN layer.

forward(inputs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type

Tensor

training: bool#
class Scale(scale)[source]#

Bases: alf.layers.ElementwiseLayerBase

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class ScaleGradient(scale)[source]#

Bases: alf.layers.ElementwiseLayerBase

Scales the gradient of input for the backward pass.

Parameters

scale (float) – a scalar factor to be multiplied to the gradient of tensor.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class Sequential(*modules, output='', **named_modules)[source]#

Bases: torch.nn.modules.module.Module

A more flexible Sequential than torch.nn.Sequential.

alf.layers.Sequential is similar to alf.nn.Sequential, but does not accept stateful alf.nn.Network as its elements.

All the modules provided through modules and named_modules are calculated sequentially in the same order as they appear in the call to Sequential. Typically, each module takes the result of the previous module as its input (or the input to the Sequential if it is the first module), and the result of the last module is the output of the Sequential. But we also allow more flexibilities as shown in example 2.

Example 1:

net = Sequential(module1, module2)
y = net(x)

is equivalent to the following:

z = module1(x)
y = module2(z)

Example 2:

net = Sequential(
    module1, a=module2, b=(('input', 'a'), module3), output=('a', 'b'))
output = net(input, state)

is equivalent to the following:

_ = module1(input)
a = module2(_)
b = module3((input, a))
output = (a, b)
Parameters
  • modules (Callable | (nested str, Callable)) – The Callable can be a torch.nn.Module, stateless alf.nn.Network or plain Callable. Optionally, their inputs can be specified by the first element of the tuple. If input is not provided, it is assumed to be the result of the previous module (or input to this Sequential for the first module). If input is provided, it should be a nested str. It will be used to retrieve results from the dictionary of the current named_results. For modules specified by modules, because no named_modules has been invoked, named_results is {'input': input}.

  • named_modules (Callable | (nested str, Callable)) – The Callable can be a torch.nn.Module, stateless alf.nn.Network or plain Callable. Optionally, their inputs can be specified by the first element of the tuple. If input is not provided, it is assumed to be the result of the previous module (or input to this Sequential for the first module). If input is provided, it should be a nested str. It will be used to retrieve results from the dictionary of the current named_results. named_results is updated once the result of a named module is calculated.

  • output (nested str) – if not provided, the result from the last module will be used as output. Otherwise, it will be used to retrieve results from named_results after the results of all modules have been calculated.

make_parallel(n)[source]#

Create a parallelized version of this network.

Parameters

n (int) – the number of copies

Returns

the parallelized version of this network

reset_parameters()[source]#
training: bool#
class SimpleAttention[source]#

Bases: torch.nn.modules.module.Module

Simple Attention Module.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(query, key, value)[source]#

Simple attention computation based on the inputs. :param query: shape [B, head, M, d] :type query: Q :param key: shape [B, head, N, d] :type key: K :param value: shape [B, head, N, d] :type value: V :param where B denotes the batch size, head denotes the number of heads,: :param N the number of entities, and d the feature dimension.:

Returns

softmax(QK^T/sqrt(d))V,

with the shape [B, head, M, d]

  • the attention weight, with the shape [B, head, M, N]

Return type

  • the attended results computed as

training: bool#
class Sum(dim)[source]#

Bases: torch.nn.modules.module.Module

Sum over given dimension(s).

Note that batch dimension is not counted for dim. This means that dim=0 means the dimension after batch dimension.

Parameters

dim (int|tuple[int]) – the dimension(s) to be summed.

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#

Create a Sum layer to handle parallel batch.

It is assumed that a parallel batch has shape [B, n, …] and both the batch dimension and replica dimension are not counted for dim

Parameters

n (int) – the number of replicas.

Returns

a Sum layer to handle parallel batch.

training: bool#
class SummarizeGradient(name)[source]#

Bases: alf.layers.ElementwiseLayerBase

A layer for summarizing the gradient of the input tensor.

Summarize the gradient of the input tensor. Always first cloning the input tensor and then setting requires_grad=True for the cloned tensor to enable gradient calculation for summarization.

Args:
name (str): used to describe the name of the summary, after the

tag ‘tensor_gradient’.

Returns

with requires_grad set to True and gradient summarization hook registered.

Return type

cloned tensor

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class TransformerBlock(d_model, num_heads, memory_size, d_k=None, d_v=None, d_ff=None, dropout=0.0, activation=<built-in method relu_ of type object>, positional_encoding='abs', add_positional_encoding=True, scale_attention_score=True)[source]#

Bases: torch.nn.modules.module.Module

Transformer residue block.

The transformer residue block includes two residue blocks with layer normalization (LN):

  1. Multi-head attention (MHA) block

  2. Position-wise MLP

The overall computation is:

y = x + MHA(LN(x))
z = y + MLP(LN(y))

The original transformer is described in: [1]. Ashish Vaswani et al. Attention Is All You Need

This implementation is a variation which places layer norm at a different location, which is described in: [2]. Ruibin Xiong et al. On Layer Normalization in the Transformer Architecture

We also support the relative positional encoding proposed in [3] Zihang Dai et al. Transformer-XL: Attentive language models beyond a fixed-length context.

In this implementation, the positional encodings are learnable parameter instead of the sinusoidal matrix proposed in [1]

Parameters
  • d_model (int) – dimension of the model, same as d_model in [1]

  • num_heads (int) – the number of attention heads

  • memory_size (int) – maximal allowed sequence length

  • d_k (int) – Dimension of key, same as d_k in [1]. If None, use d_model // num_heads

  • d_v (int) – Dimension of value, same as d_v in [1]. If None, use d_model // num_heads

  • d_ff (int) – Diemension of the MLP, same as d_ff in [1]. If None, use 4 * d_model

  • dropout (float) – the dropout ratio. Note the [1] uses 0.1 for dropout.

  • activation (Callable) – the activiation for the hidden layer of the MLP. relu and gelu are two popular choices.

  • positional_encoding (str) – One of [‘none’, ‘abs’, ‘rel’]. If ‘none’, no position encoding will be used. If ‘abs’, use absolute positional encoding depending on the absolute position in the memory sequence, same as that described in [1]. If ‘rel’, use the relative positional encoding proposed in [3].

  • add_positional_encoding (bool) – If True, in addition to use positional encoding for calculating the attention weights, the positional encoding is also concatenated to the attention result so that the attention result can keep the location information better. Note that using this option will increase the number of parameters by about 25%. This option is ignored if positional_encoding is ‘none’.

  • scale_attention_score (bool) – If True, scale the attention score by d_k ** -0.5 as suggested in [1]. However, this may not always be better since it slows the unittest in layers_test.py

forward(memory, query=None, mask=None)[source]#

Forward computation.

Notation: B: batch_size, N: length of memory, M: length of query

Parameters
  • memory (Tensor) – The shape is [B, N, d_model]

  • query (Tensor) – The shape [B, d_model] or [B, M, d_model]. If None, will use memory as query

  • mask (Tensor|None) – A tensor for indicating which slot in memory will NOT be used. Its shape can be [B, N] or [B, M, N]. If the shape is [B, N], mask[b, n] = True indicates NOT using memory[b, n] for calculating the attention result for query[b], while mask[b, n] = False means using it. If the shape is [B, M, N], maks[b, m, n] = True indicates NOT to use memory[b, n] for calculating the attention result for query[b, m], while mask[b, m, n] = False indicates using memory[b, n] to attend query[b, m].

Returns

the shape is same as query.

Return type

Tensor

reset_parameters()[source]#

Initialize the parameters.

training: bool#
class Transpose(dim0=0, dim1=1)[source]#

Bases: torch.nn.modules.module.Module

A layer that perform the transpose of channels.

Note that batch dimension is not considered for transpose. This means that dim0=0 means the dimension after batch dimension.

Parameters
  • dim0 (int) – the first dimension to be transposed.

  • dim1 (int) – the second dimension to be transposed

forward(x)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_parallel(n)[source]#

Create a Transpose layer to handle parallel batch.

It is assumed that a parallel batch has shape [B, n, …] and both the batch dimension and replica dimension are not considered for transpose.

Parameters

n (int) – the number of replicas.

Returns

a Transpose layer to handle parallel batch.

training: bool#
make_parallel_input(inputs, n)[source]#

Replicate inputs over dim 1 for n times so it can be processed by parallel networks.

Parameters
  • inputs (nested Tensor) – a nest of Tensor

  • n (int) – inputs will be replicated n times.

Returns

inputs replicated over dim 1

make_parallel_net(module, n)[source]#

Make a parallelized version of module.

A parallel network has n copies of network with the same structure but different independently initialized parameters. The parallel network can process a batch of the data with shape [batch_size, n, …] using n networks with same structure.

If module has member function make_parallel, it will be called to make the parallel network. Otherwise, it will creates a NaiveParallelLayer, which simply making n copies of module and use a loop to call them in forward().

Examples:

Applying parallel net on same input:

pnet = make_parallel_net(net, n)
# replicate input.
# pinput will have shape [batch_size, n, ...], if input has shape [batch_size, ...]
pinput = make_parallel_input(input, n)
poutput = pnet(pinput)

If you already have parallel input with shape [batch_size, n, …], you can omit the call to make_parallel_input in the above code.

Parameters
  • module (Network | nn.Module | Callable) – the network to be parallelized.

  • n (int) – the number of copies

Returns

the parallelized network.

make_parallel_spec(specs, n)[source]#

Make the spec for parallel network.

Parameters
  • specs (nested TensorSpec) – the input spec for the non-parallelized network

  • n (int) – the number of copies of the parallelized network

Returns

input tensor spec for the parallelized network

normalize_along_batch_dims(x, mean, variance, variance_epsilon)[source]#

Normalizes a tensor by mean and variance, which are expected to have the same tensor spec with the inner dims of x.

Parameters
  • x (Tensor) – a tensor of ([D1, D2, ..] + shape), where D1, D2, .. are arbitrary leading batch dims (can be empty).

  • mean (Tensor) – a tensor of shape

  • variance (Tensor) – a tensor of shape

  • variance_epsilon (float) – A small float number to avoid dividing by 0.

Returns

Normalized tensor.

reset_parameters(module)[source]#

Reset the parameters for module.

Parameters

module (nn.Module) –

Returns

None

Raises

ValueError – fail to reset the parameters for module

alf.module#

Patch torch.nn.Module for better performance.

torch.nn.Module.__getattr__ is frequently used by all class derived from nn.Module. It can introduce too much unnecessary overhead. So we patch nn.Module class to remove it.

alf.norm_layers#

class BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, fixed_weight_norm=False, use_bias=True, track_running_stats=True)[source]#

Bases: alf.norm_layers._NormBase

Batch Normalization over a 2D or 3D input.

For detail about Batch Normalization, see https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html

The main difference is that this implementation supports using BN for RNN. The reason is that for RNN, the normalization statics can be dramatically different for different step of RNN. Hence we need to maintain different running statistics for different step of RNN.

The following example shows how to use it, assuming rnn is a Network which contains some alf.layers.BatchNorm layers.

prepare_rnn_batch_norm(rnn)
rnn.set_batch_norm_max_steps(5)

for i in range(t):
    rnn.set_batch_norm_current_step(i)
    y, state = rnn(input[i], state)

Note that set_batch_norm_current_step() also accepts Tensor as its argument. In that case, it means that the current step for each sample in a batch.

Parameters
  • num_features (int) – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\)

  • eps (float) – a value added to the denominator for numerical stability. Default: 1e-5

  • momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1

  • affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Default: True

  • fixed_weight_norm – whether to fix the norm of the affine weight parameter. The norm will be fixed at ``sqrt(num_features).

  • use_bias (bool) – whether to use bias. Note that if affine is True, this argument is ignored and bias will be used.

  • track_running_stats (bool) – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics. in both training and eval modes. Default: True

Shape:
  • Input: \((N, C)\) or \((N, C, L)\)

  • Output: \((N, C)\) or \((N, C, L)\) (same shape as input)

training: bool#
class BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, fixed_weight_norm=False, use_bias=True, track_running_stats=True)[source]#

Bases: alf.norm_layers._NormBase

Applies Batch Normalization over a 4D input.

For detail about Batch Normalization, see https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html

The main difference is that this implementation supports using BN for RNN. The reason is that for RNN, the normalization statics can be dramatically different for different step of RNN. Hence we need to maintain different running statistics for different step of RNN.

The following example shows how to use it, assuming rnn is a Network which contains some alf.layers.BatchNorm layers.

prepare_rnn_batch_norm(rnn)     # Only need to call once in the lifetime of rnn
rnn.set_batch_norm_max_steps(5) # Only need to call once in the lifetime of rnn

for i in range(t):
    rnn.set_batch_norm_current_step(i)
    y, state = rnn(input[i], state)
Parameters
  • num_features (int) – \(C\) from an expected input of size \((N, C, H, W)\)

  • eps (float) – a value added to the denominator for numerical stability. Default: 1e-5

  • momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1

  • affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Default: True

  • fixed_weight_norm – whether to fix the norm of the affine weight parameter. The norm will be fixed at ``sqrt(num_features).

  • use_bias (bool) – whether to use bias. Note that if affine is True, this argument is ignored and bias will be used.

  • track_running_stats (bool) – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics. in both training and eval modes. Default: True

Shape:
  • Input: \((N, C, H, W)\)

  • Output: \((N, C, H, W)\) (same shape as input)

training: bool#
class ParamLayerNorm(n_groups, output_channels, eps=1e-05)[source]#

Bases: torch.nn.modules.module.Module

ParamLayerNorm, adapted from torch.nn.modules.LayerNorm

A general Layer Normalization layer that does not maintain learnable affine parameters (weight and bias), but accepts both from users. If n_groups is greater than 1, it performs parallel Layer Normalization operation. :type n_groups: int :param n_groups: number of parallel groups :type output_channels: int :param output_channels: output size for FC layers, output channel size

for conv layers.

Parameters

eps (float) – refer to nn.GroupNorm

property bias#

Get stored bias tensor or batch of bias tensors.

property bias_length#

Get the n_element of a single bias tensor.

forward(inputs, keep_group_dim=True)[source]#

Forward :type inputs: Tensor :param inputs: refer to _preprocess_input of subclass for detailed description. :type keep_group_dim: bool :param keep_group_dim: whether to keep group dimension or not.

Returns

for BatchNorm1d, with shape [B, n, D] or [B, n*D],

for BatchNorm2d, with shape [B, n, C, H, W] or [B, n*C, H, W].

Return type

torch.Tensor

property output_channels#

Get the n_element of a single weight tensor.

property param_length#

Get total number of parameters for all layers.

set_parameters(theta, reinitialize=False)[source]#

Distribute parameters to corresponding parameters. :type theta: Tensor :param theta: with shape [D] (groups=1) or [B, D] (groups=B),

where the meaning of the symbols are: - B: batch size - D: length of parameters, should be self.param_length When the shape of inputs is [D], it will be unsqueezed to [1, D].

Parameters

reinitialize (bool) – whether to reinitialize parameters of each layer.

training: bool#
property weight#

Get stored weight tensor or batch of weight tensors.

property weight_length#

Get the n_element of a single weight tensor.

class ParamLayerNorm1d(n_groups, output_channels, eps=1e-05)[source]#

Bases: alf.norm_layers.ParamLayerNorm

A general Layer Normalization layer that does not maintain learnable affine parameters (weight and bias), but accepts both from users. If n_groups is greater than 1, it performs parallel Layer Normalization operation. :type n_groups: int :param n_groups: number of parallel groups :type output_channels: int :param output_channels: output size for FC layers, output channel size

for conv layers.

Parameters

eps (float) – refer to nn.GroupNorm

training: bool#
class ParamLayerNorm2d(n_groups, output_channels, eps=1e-05)[source]#

Bases: alf.norm_layers.ParamLayerNorm

A general Layer Normalization layer that does not maintain learnable affine parameters (weight and bias), but accepts both from users. If n_groups is greater than 1, it performs parallel Layer Normalization operation. :type n_groups: int :param n_groups: number of parallel groups :type output_channels: int :param output_channels: output size for FC layers, output channel size

for conv layers.

Parameters

eps (float) – refer to nn.GroupNorm

training: bool#
prepare_rnn_batch_norm(module)[source]#

Prepare an RNN network module to use alf.layers.BatchNorm layers.

It will report error if any nn.BatchNorm layer is found within module

Return type

bool

Returns

True if alf.layers.BatchNorm layers have been found

False otherwise.

set_batch_norm_current_step(module, current_step)[source]#

Set current_step for all batch norm layers in module.

Parameters

current_step (Union[Tensor, int]) – the current step for RNN. If it is a Tensor, it means that the current step for each sample in a batch.

set_batch_norm_max_steps(module, max_steps)[source]#

Set max_steps for all batch norm layers in module.

Parameters

max_steps (int) – the maximum steps for which the batch norm running statistics are maintained.

alf.tensor_specs#

TensorSpec with PyTorch types; adapted from Tensorflow’s tensor_spec.py:

https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/framework/tensor_spec.py

class BoundedTensorSpec(shape, dtype=torch.float32, minimum=0, maximum=1)[source]#

Bases: alf.tensor_specs.TensorSpec

A TensorSpec that specifies minimum and maximum values. Example usage:

spec = BoundedTensorSpec((1, 2, 3), torch.float32, 0, (5, 5, 5))
torch_minimum = torch.as_tensor(spec.minimum, dtype=spec.dtype)
torch_maximum = torch.as_tensor(spec.maximum, dtype=spec.dtype)

Bounds are meant to be inclusive. This is especially important for integer types. The following spec will be satisfied by tensors with values in the set {0, 1, 2}:

spec = BoundedTensorSpec((3, 5), torch.int32, 0, 2)
Parameters
  • shape (tuple[int]) – The shape of the tensor.

  • dtype (str or torch.dtype) – The type of the tensor values, e.g., “int32” or torch.int32

  • minimum – numpy number or sequence specifying the minimum element bounds (inclusive). Must be broadcastable to shape.

  • maximum – numpy number or sequence specifying the maximum element bounds (inclusive). Must be broadcastable to shape.

classmethod from_spec(spec)[source]#
classmethod is_bounded()[source]#
property maximum#

Returns a NumPy array specifying the maximum bounds (inclusive).

property minimum#

Returns a NumPy array specifying the minimum bounds (inclusive).

numpy_sample(outer_dims=None, rng=<module 'numpy.random' from '/home/docs/checkouts/readthedocs.org/user_builds/alf-fork/envs/latest/lib/python3.8/site-packages/numpy/random/__init__.py'>)[source]#

Sample numpy arrays uniformly given the min/max bounds.

Parameters
  • outer_dims (list[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

  • rng (numpy.random.RandomState) – random number generator

Returns

an array of self._dtype

Return type

np.ndarray

replace(shape=None, dtype=None, minimum=None, maximum=None)[source]#

Create a new BoundedTensorSpec with part of the properties replaced.

For example, if we have a BoundedTensorSpec like

spec = BoundedTensorSpec((3, 5), torch.int32, 0, 2)

You can explicitly create a similar spec with a different shape and minimum by

new_spec = spec.replace(shape=(4, 8), minimum=-1)
Return type

BoundedTensorSpec

sample(outer_dims=None)[source]#

Sample uniformly given the min/max bounds.

Parameters

outer_dims (list[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

a tensor of self._dtype

Return type

tensor (torch.Tensor)

class TensorSpec(shape, dtype=torch.float32)[source]#

Bases: object

Describes a torch.Tensor.

A TensorSpec allows an API to describe the Tensors that it accepts or returns, before that Tensor exists. This allows dynamic and flexible graph construction and configuration.

Parameters
  • shape (tuple[int]) – The shape of the tensor.

  • dtype (str or torch.dtype) – The type of the tensor values, e.g., “int32” or torch.int32

constant(value, outer_dims=None)[source]#

Create a constant tensor from the spec.

Parameters
  • value – a scalar

  • outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

a tensor of self._dtype.

Return type

tensor (torch.Tensor)

property dtype#

Returns the dtype of elements in the tensor.

property dtype_str#

The str representation of dtype

It can be used to contruct a numpy array.

classmethod from_array(array, from_dim=0)[source]#

Create TensorSpec from numpy array.

Parameters
  • array (np.ndarray|np.number) – array from which the spec is extracted

  • from_dim (int) – use array.shape[from_dim:] as shape

Returns

TensorSpec

classmethod from_spec(spec)[source]#
classmethod from_tensor(tensor, from_dim=0)[source]#

Create TensorSpec from tensor.

Parameters
  • tensor (Tensor) – tensor from which the spec is extracted

  • from_dim (int) – use tensor.shape[from_dim:] as shape

Returns

TensorSpec

classmethod is_bounded()[source]#
property is_continuous#

Whether spec is continuous.

property is_discrete#

Whether spec is discrete.

property ndim#

Return the rank of the tensor.

property numel#

Returns the number of elements.

numpy_constant(value, outer_dims=None)[source]#

Create a constant np.ndarray from the spec.

Parameters
  • value (Number) – a scalar

  • outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

an array of self._dtype.

Return type

np.ndarray

numpy_zeros(outer_dims=None)[source]#

Create a zero numpy.ndarray from the spec.

Parameters

outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

an array of self._dtype.

Return type

np.ndarray

ones(outer_dims=None)[source]#

Create an all-one tensor from the spec.

Parameters

outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

a tensor of self._dtype.

Return type

tensor (torch.Tensor)

rand(outer_dims=None)[source]#

Create a tensor filled with random numbers in \([0,1]\).

Parameters

outer_dims (Optional[Tuple[int]]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

a tensor of self._dtype.

Return type

torch.Tensor

randn(outer_dims=None)[source]#

Create a tensor filled with random numbers from a std normal dist.

Parameters

outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

a tensor of self._dtype.

Return type

tensor (torch.Tensor)

replace(shape=None, dtype=None)[source]#

Create a new TensorSpec with part of the properties replaced.

For example, if we have a TensorSpec like

spec = TensorSpec((3, 5), torch.int32)

You can explicitly create a similar spec with a different dtype by

new_spec = spec.replace(dtype=torch.float32)
Return type

TensorSpec

property shape#

Returns the TensorShape that represents the shape of the tensor.

zeros(outer_dims=None)[source]#

Create a zero tensor from the spec.

Parameters

outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.

Returns

a tensor of self._dtype.

Return type

tensor (torch.Tensor)

torch_dtype_to_str(dtype)[source]#