alf#
- alf.algorithms
- alf.algorithms.ppg
- alf.algorithms.actor_critic_algorithm
- alf.algorithms.actor_critic_loss
- alf.algorithms.agent
- alf.algorithms.agent_helpers
- alf.algorithms.algorithm
- alf.algorithms.algorithm_interface
- alf.algorithms.async_unroller
- alf.algorithms.bc_algorithm
- alf.algorithms.causal_bc_algorithm
- alf.algorithms.config
- alf.algorithms.containers
- alf.algorithms.data_transformer
- alf.algorithms.ddpg_algorithm
- alf.algorithms.decoding_algorithm
- alf.algorithms.diayn_algorithm
- alf.algorithms.dqn_algorithm
- alf.algorithms.dynamic_action_repeat_agent
- alf.algorithms.dynamics_learning_algorithm
- alf.algorithms.encoding_algorithm
- alf.algorithms.entropy_target_algorithm
- alf.algorithms.functional_particle_vi_algorithm
- alf.algorithms.generator
- alf.algorithms.goal_generator
- alf.algorithms.handcrafted_algorithm
- alf.algorithms.hypernetwork_algorithm
- alf.algorithms.icm_algorithm
- alf.algorithms.iql_algorithm
- alf.algorithms.lagrangian_reward_weight_algorithm
- alf.algorithms.mbrl_algorithm
- alf.algorithms.mcts_algorithm
- alf.algorithms.mcts_models
- alf.algorithms.mdq_algorithm
- alf.algorithms.merlin_algorithm
- alf.algorithms.mi_estimator
- alf.algorithms.monet_algorithm
- alf.algorithms.muzero_algorithm
- alf.algorithms.muzero_representation_learner
- alf.algorithms.oac_algorithm
- alf.algorithms.off_policy_algorithm
- alf.algorithms.on_policy_algorithm
- alf.algorithms.one_step_loss
- alf.algorithms.particle_vi_algorithm
- alf.algorithms.planning_algorithm
- alf.algorithms.ppg_algorithm
- alf.algorithms.ppo_algorithm
- alf.algorithms.ppo_loss
- alf.algorithms.predictive_representation_learner
- alf.algorithms.prior_actor
- alf.algorithms.qrsac_algorithm
- alf.algorithms.reward_learning_algorithm
- alf.algorithms.rl_algorithm
- alf.algorithms.rnd_algorithm
- alf.algorithms.sac_algorithm
- alf.algorithms.sarsa_algorithm
- alf.algorithms.taac_algorithm
- alf.algorithms.td_loss
- alf.algorithms.trac_algorithm
- alf.algorithms.vae
- alf.algorithms.vq_vae
- alf.bin
- alf.environments
- alf.environments.adv_fetch_envs
- alf.environments.metadrive
- alf.environments.simple
- alf.environments.alf_environment
- alf.environments.alf_gym3_wrapper
- alf.environments.alf_gym_wrapper
- alf.environments.alf_wrappers
- alf.environments.carla_controller
- alf.environments.carla_sensors
- alf.environments.carla_spectator
- alf.environments.dmc_gym_wrapper
- alf.environments.fast_parallel_environment
- alf.environments.gym_wrappers
- alf.environments.make_penv
- alf.environments.mario_wrappers
- alf.environments.parallel_environment
- alf.environments.process_environment
- alf.environments.random_alf_environment
- alf.environments.suite_babyai
- alf.environments.suite_bsuite
- alf.environments.suite_carla
- alf.environments.suite_dmc
- alf.environments.suite_dmlab
- alf.environments.suite_go
- alf.environments.suite_gym
- alf.environments.suite_highway
- alf.environments.suite_mario
- alf.environments.suite_metadrive
- alf.environments.suite_procgen
- alf.environments.suite_robotics
- alf.environments.suite_safety_gym
- alf.environments.suite_simple
- alf.environments.suite_socialbot
- alf.environments.suite_tic_tac_toe
- alf.environments.suite_unittest
- alf.environments.thread_environment
- alf.environments.utils
- alf.experience_replayers
- alf.metrics
- alf.nest
- alf.networks
- alf.networks.action_encoder
- alf.networks.actor_distribution_networks
- alf.networks.actor_networks
- alf.networks.containers
- alf.networks.critic_networks
- alf.networks.dynamics_networks
- alf.networks.encoding_networks
- alf.networks.mdq_critic_networks
- alf.networks.memory
- alf.networks.network
- alf.networks.networks
- alf.networks.normalizing_flow_networks
- alf.networks.ou_process
- alf.networks.param_networks
- alf.networks.preprocessor_networks
- alf.networks.preprocessors
- alf.networks.projection_networks
- alf.networks.q_networks
- alf.networks.relu_mlp
- alf.networks.transformer_networks
- alf.networks.value_networks
- alf.optimizers
- alf.summary
- alf.test
- alf.trainers
- alf.utils
- alf.utils.action_quantizer
- alf.utils.action_samplers
- alf.utils.averager
- alf.utils.checkpoint_utils
- alf.utils.common
- alf.utils.conditional_ops
- alf.utils.data_buffer
- alf.utils.datagen
- alf.utils.dist_utils
- alf.utils.distributed
- alf.utils.distributions
- alf.utils.external_configurables
- alf.utils.gin_utils
- alf.utils.git_utils
- alf.utils.lean_function
- alf.utils.losses
- alf.utils.math_ops
- alf.utils.normalizers
- alf.utils.per_process_context
- alf.utils.plot_tb_curves
- alf.utils.pretty_print
- alf.utils.process_coordinator
- alf.utils.schedulers
- alf.utils.sl_utils
- alf.utils.spec_utils
- alf.utils.summary_utils
- alf.utils.tensor_utils
- alf.utils.value_ops
- alf.utils.video_recorder
- alf.utils.visualizer
alf.config_helpers#
Helper functions for config alf training.
The main motivation is to give the access of observation_spec and action_spec, which are necessary for config some models. observation_spec and action_spec are only available after the environment is created. So we create an environment based TrainerConfig in this module.
- close_env()[source]#
Close the global environment.
This function will be automatically called by
RLTrainer.
- get_action_spec()[source]#
Get the specs of the tensors expected by
step(action)of the environment.Note: you need to finish all the config for environments and TrainerConfig.random_seed before using this function.
- Returns
a spec that describes the shape and dtype of each tensor expected by
step().- Return type
nested TensorSpec
- get_env()[source]#
Get the global training environment.
Note: you need to finish all the config for environments and TrainerConfig.random_seed before using this function.
Note: random seed will be initialized in this function.
- Returns
AlfEnvironment
- get_observation_spec(field=None)[source]#
Get the spec of observation transformed by data transformers.
The data transformers are specified by
TrainerConfig.data_transformer_ctor.Note
You need to finish all the config for environments and
TrainerConfig.data_transformer_ctorbefore using this function.- Parameters
field (str) – a multi-step path denoted by “A.B.C”.
- Returns
a spec that describes the observation.
- Return type
nested TensorSpec
- get_raw_observation_spec(field=None)[source]#
Get the
TensorSpecof observations provided by the global environment.Note
This function can only be called after all gym wrappers and
TrainerConfig.random_seedhave been configured. Otherwise the created environment might have unexpected behaviors.- Parameters
field (str) – a multi-step path denoted by “A.B.C”.
- Returns
a spec that describes the observation.
- Return type
nested TensorSpec
- get_reward_spec()[source]#
Get the spec of the reward returned by the environment.
Note: you need to finish all the config for environments and TrainerConfig.random_seed before using this function.
- Returns
a spec that describes the shape and dtype of reward.
- Return type
- parse_config(conf_file, conf_params)[source]#
Parse config file and config parameters
Note: a global environment will be created (which can be obtained by alf.get_env()) and random seed will be initialized by this function using common.set_random_seed().
- Parameters
conf_file (str) – The full path of the config file.
conf_params (list[str]) – the list of config parameters. Each one has a format of CONFIG_NAME=VALUE.
alf.config_util#
Alf configuration utilities.
- config(prefix_or_dict, mutable=True, raise_if_used=True, **kwargs)[source]#
Set the values for the configs with given name as suffix.
Example:
Assume we have the following decorated functions and classes:
@alf.configurable def cool_func(param1, cool_arg1='a default value', cool_arg2=3): ... @alf.configurable def dumb_func(param1, a=1, b=2): ... @alf.configurable class Worker(obj): def __init__(self, job1=1, job2=2): ... @alf.configurable def func(self, a, b): ...
We can config in the following ways:
alf.config('cool_func', cool_arg1='new_value', cool_arg2='another_value') alf.config('Worker.func', b=3) alf.config('func', b=3) # 'Worker.func' can be uniquely identified by 'func' alf.config({ 'dumb_func.b': 3, 'Worker.job1': 2 # now the default value of job1 for Worker() becomes 2. })
- Parameters
prefix_or_dict (str|dict) – if a dict, each (key, value) pair in it specifies the value for a config with name key. If a str, it is used as prefix so that each (key, value) pair in kwargs specifies the value for config with name
prefix + '.' + keymutable (bool) – whether the config can be changed later. If the user tries to change an existing immutable config, the change will be ignored and a warning will be generated. You can always change a mutable config.
ValueErrorwill be raised if trying to set a new immutable value to an existing immutable value.raise_if_used (bool) – If True, ValueError will be raised if trying to config a value which has already been used.
**kwargs – only used if
prefix_or_dictis a str.
- config1(config_name, value, mutable=True, raise_if_used=True)[source]#
Set one configurable value.
- Parameters
config_name (str) – name of the config
value (any) – value of the config
mutable (bool) – whether the config can be changed later. If the user tries to change an existing immutable config, the change will be ignored and a warning will be generated. You can always change a mutable config.
ValueErrorwill be raised if trying to set a new immutable value to an existing immutable value.raise_if_used (bool) – If True, ValueError will be raised if trying to config a value which has already been used.
- configurable(fn_or_name=None, whitelist=[], blacklist=[])[source]#
Decorator to make a function or class configurable.
This decorator registers the decorated function/class as configurable, which allows its parameters to be supplied from the global configuration (i.e., set through
alf.config()). The decorated function is associated with a name in the global configuration, which by default is simply the name of the function or class, but can be specified explicitly to avoid naming collisions or improve clarity.If some parameters should not be configurable, they can be specified in
blacklist. If only a restricted set of parameters should be configurable, they can be specified inwhitelist.The decorator can be used without any parameters as follows:
In this case, the function is associated with the name ‘my_function’ in the global configuration, and both param1 and param2 are configurable.
The decorator can be supplied with parameters to specify the configurable name or supply a whitelist/blacklist:
In this case, the configurable is associated with the name ‘my_func’ in the global configuration, and only param2 is configurable.
Classes can be decorated as well, in which case parameters of their constructors are made configurable:
@alf.configurable class MyClass(object): def __init__(self, param1, param2='a default value'): ...
In this case, the name of the configurable is ‘MyClass’, and both param1 and param2 are configurable.
The full name of a configurable value is MODULE_PATH.FUNC_NAME.ARG_NAME. It can be referred using any suffixes as long as there is no ambiguity. For example, assuming there are two configurable values “abc.def.func.a” and “xyz.uvw.func.a”, you can use “abc.def.func.a”, “def.func.a”, “xyz.uvw.func.a” or “uvw.func.a” to refer these two configurable values. You cannot use “func.a” because of the ambiguity. Because of this, you cannot have a config name which is the strict suffix of another config name. For example, “A.Test.arg” and “Test.arg” cannot both be defined. You can supply a different name for the function to avoid conflict:
@alf.configurable("NewTest") def Test(arg): ...
or
@alf.configurable("B.Test") def Test(arg): ...
Note: currently, to maintain the compatibility with gin-config, all the functions decorated using alf.configurable are automatically configurable using gin. The values specified using
alf.config()will override values specified through gin. Gin wrapper is quite convoluted and can make debugging more challenging. It can be disabled by setting environment varialbe ALF_USE_GIN to 0 if you are not using gin.- Parameters
fn_or_name (Callable|str) – A name for this configurable, or a function to decorate (in which case the name will be taken from that function). If not set, defaults to the name of the function/class that is being made configurable. If a name is provided, it may also include module components to be used for disambiguation. If the module components is provided, the original module name of the function will not be used to compose the full name.
whitelist (list[str]) – A whitelisted set of kwargs that should be configurable. All other kwargs will not be configurable. Only one of
whitelistorblacklistshould be specified.blacklist (list[str]) – A blacklisted set of kwargs that should not be configurable. All other kwargs will be configurable. Only one of
whitelistorblacklistshould be specified.
- Returns
decorated function if fn_or_name is Callable. a decorator if fn is not Callable.
- Raises
ValueError – If a configurable with
name(or the name of fn_or_cls) already exists, or if both a whitelist and blacklist are specified.
- define_config(name, default_value)[source]#
Define a configurable value with given
default_value.Its value can be retrieved by
get_config_value().- Parameters
name (str) – name of the configurable value
default_value (Any) – default value
- get_config_value(config_name)[source]#
Get the value of the config with the name
config_name.- Parameters
config_name (str) – name of the config or its suffix which can uniquely identify the config.
- Returns
value of the config
- Return type
Any
- Raises
ValueError – if the value of the config has not been configured and it does not have a default value.
- get_inoperative_configs()[source]#
Get all the configs that have not been used.
A config is inoperative if its value has been set through
alf.config()but its set value has never been used by any function calls.- Returns
list[tuple[config_name, Any]]
- get_operative_configs()[source]#
Get all the configs that have been used.
A config is operative if a function call does not explicitly specify the value of that config and hence its default value or the value provided through alf.config() needs to be used.
- Returns
list[tuple[config_name, Any]]
- import_config(conf_file)[source]#
Import the config from another file.
Different from
load_config(),import_config()should only be used in config files. And it can be used multiple times inside your config files.If
conf_fileis a relative path,load_config()will first try to find it in the directory of the config file calling this function. If it cannot be found there, directories in the environment varianbleALF_CONFIG_PATHwill be searched in order.Examples:
1. Suppose you have a config file
~/code/my_conf.py. You want to import another config file~/code/my_conf2.py. You can useimport_config("my_conf2.py")to importmy_config2.py.2. Suppose you have a config file
~/code/my_conf.py. You want to import another config file~/code/base/my_conf2.py. You can useimport_config("base/my_conf2.py")to importmy_config2.py.3. Suppose you have a config file
~/code/my_conf.py. You want to import another config file~/packages/my_conf2.py. You need to set the environment variable asALF_CONFIG_PATH=~/packages. Then can useimport_config("my_conf2.py")to importmy_config2.py.- Parameters
conf_file –
- Returns
the config module object, which can be used in a similar way as python imported module.
- load_config(conf_file)[source]#
Load config from a file.
Different from
import_config(),load_config()should only be used by your main code to load the config. And it should be only called once unlessreset_configs()is called to reset the configuration to default state.If
conf_fileis a relative path,load_config()will first try to find it in the current working directory. If it cannot be found there, directories in the environment varianbleALF_CONFIG_PATHwill be searched in order.- Parameters
conf_file –
- Returns
the config module object, which can be used in a similar way as python imported module.
- pre_config(configs)[source]#
Preset the values for configs before the module defining it is imported.
This function is useful for handling the config params from commandline, where there are no module imports and hence no config has been defined.
The value is bound to the config when the module defining the config is imported later. ``validate_pre_configs()` should be called after the config file has been loaded to ensure that all the pre_configs have been correctly bound.
- Parameters
configs (dict) – dictionary of config name to value
- repr_wrapper(cls)[source]#
A wrapper for automatically generating readable repr for an object.
The presentation shows the arguments used to construct of object. It does not include the default arguments, nor the class members.
To use it, simply use it to decorate an class.
Example:
@repr_wrapper class MyClass(object): def __init__(self, a, b, c=100, d=200): pass a = MyClass(1, 2) assert repr(a) == "MyClass(1, 2)" a = MyClass(3, 5, d=300) assert repr(a) == "MyClass(1, 2, d=300)"
- save_config(alf_config_file)[source]#
Save config files.
This will save config set using
pre_config(), the file loaded usingload_config()and the files imported usingimport_config()if they are in the config root directory or its sub-directory, where the config root directory is the directory of the conf file loaded byload_config().
alf.data_structures#
Various data structures. Converted to PyTorch from the TF version.
- class AlgStep(output, state, info)#
Bases:
tupleCreate new instance of AlgStep(output, state, info)
- info#
Alias for field number 2
- output#
Alias for field number 0
- state#
Alias for field number 1
- class BasicRLInfo(action)#
Bases:
tupleCreate new instance of BasicRLInfo(action,)
- action#
Alias for field number 0
- class BasicRolloutInfo(rl, rewards, repr)#
Bases:
tupleCreate new instance of BasicRolloutInfo(rl, rewards, repr)
- repr#
Alias for field number 2
- rewards#
Alias for field number 1
- rl#
Alias for field number 0
- class Experience(time_step=(), action=(), rollout_info=(), state=(), batch_info=(), replay_buffer=(), rollout_info_field=())[source]#
Bases:
alf.data_structures.ExperienceAn
Experienceis aTimeStepin the context of training an RL algorithm. For the training purpose, it contains the following attributes:- time_step (TimeStep): A
TimeStepstructure contains the data emitted by an environment at each step of interaction.
- time_step (TimeStep): A
action: A (nested)
Tensorfor action taken for the current time step.rollout_info:
AlgStep.infofromrollout_step().state: State passed to
rollout_step()to generateaction.- batch_info: Its type is
alf.experience_replays.replay_buffer.BatchInfo. This is only used when experiece is passed as an argument for
Algorithm.calc_loss(). Different from other members, the shape of the tensors inbatch_infois [B], where B is the batch size.
- batch_info: Its type is
- replay_buffer: The replay buffer where the batch_info generated from.
Currently, this field is available when experience is passed to
Algorithm.calc_loss(),Algorithm.preprocess_experience()orDataTransformer.transform_experience()
- rollout_info_field: The name of the rollout_info field in replay buffer.
This is useful when an algorithm needs to access its rollout_info in the replay buffer.
Create new instance of Experience(time_step, action, rollout_info, state, batch_info, replay_buffer, rollout_info_field)
- property discount#
- property env_id#
- get_time_step_field(field)[source]#
Get the value of the experience.time_step specified by
field. Since we have exposed the common time_step fields as properties ofExperience, this function can be used when the field if not covered by the exposed properties. :param field: indicate the field to be retrieved in time_step. :type field: str- Returns
The value of the field in time_step corresponding to
field.
- property observation#
- property prev_action#
- property reward#
- property step_type#
- update_time_step_field(field, new_value)[source]#
Update the value of the experience.time_step specified by
field. :param field: indicate the field to be updated :type field: str :param new_value: the new value for the field :type new_value: any- Returns
a structure the same as the original experience except that the field
fieldin the time_step is replaced bynew_value.- Return type
- class LossInfo(loss, scalar_loss, extra, priority, gns, batch_label)#
Bases:
tupleCreate new instance of LossInfo(loss, scalar_loss, extra, priority, gns, batch_label)
- batch_label#
Alias for field number 5
- extra#
Alias for field number 2
- gns#
Alias for field number 4
- loss#
Alias for field number 0
- priority#
Alias for field number 3
- scalar_loss#
Alias for field number 1
- class StepType(value)[source]#
Bases:
objectDefines the status of a
TimeStepwithin a sequence.Add ability to create
StepTypeconstants from a value.- FIRST = 0#
- LAST = 2#
- MID = 1#
- class TimeStep(step_type=(), reward=(), discount=(), observation=(), prev_action=(), env_id=(), untransformed=(), env_info=())[source]#
Bases:
alf.data_structures.TimeStepA
TimeStepcontains the data emitted by an environment at each step of interaction. ATimeStepholds astep_type, anobservation(typically a NumPy array or a dict or list of arrays), and an associatedrewardanddiscount.The first
TimeStepin a sequence will equalStepType.FIRST. The finalTimeStepwill equalStepType.LAST. All otherTimeStep``s in a sequence will equal to ``StepType.MID.It has eight attributes:
step_type: a
Tensoror numpy int ofStepTypeenum values.reward: a
Tensorof reward values from executing ‘prev_action’.discount: A discount value in the range \([0, 1]\).
observation: A (nested)
Tensorfor observation.prev_action: A (nested)
Tensorfor action from previous time step.env_id: A scalar
Tensorof the environment ID of the time step.untransformed: a nest that represents the entire time step itself before any transformation (e.g., observation or reward transformation); used for experience replay observing by subalgorithms.
env_info: A dictionary containing information returned by Gym environments’
info.
Create new instance of TimeStep(step_type, reward, discount, observation, prev_action, env_id, untransformed, env_info)
- add_batch_info(experience, batch_info, buffer=())[source]#
Add batch_info and rollout_info_field string to experience.
- clear_batch_info(experience)[source]#
Clear batch_info and rollout_info_field string from experience.
Useful as certain nest functions like convert_device do not skip non-tensor objects in nests.
- elastic_namedtuple(name, args)[source]#
elastic namedtuple that returns
()for a non-existing attribute, instead of throwing out anAttributeError.- Parameters
name (str) – type name of this elastic namedtuple.
args – other arguments for constructing the namedtuple
- Returns
the type for the elastic namedtuple
- make_experience(time_step, alg_step, state)[source]#
Make an instance of
ExperiencefromTimeStepandAlgStep.- Parameters
- Returns
- Return type
- namedtuple(typename, field_names, default_value=None, default_values=())[source]#
namedtuple with default value.
- Parameters
typename (str) – type name of this namedtuple.
field_names (list[str]) – name of each field.
default_value (Any) – the default value for all fields.
default_values (list|dict) – default value for each field.
- Returns
the type for the namedtuple
- restart(observation, action_spec, reward_spec=TensorSpec(shape=(), dtype=torch.float32), env_id=None, env_info={}, batched=False)[source]#
Returns a
TimeStepwithstep_typeset equal toStepType.FIRST.Called by
env.reset().- Parameters
observation (nested tensors) – observations of the env.
action_spec (nested TensorSpec) – tensor spec of actions.
reward_spec (TensorSpec) – a rank-1 or rank-0 (default) tensor spec
env_id (batched or scalar torch.int32) – (optional) ID of the env.
env_info (dict) – extra info returned by the environment.
batched (bool) – (optional) whether batched envs or not.
- Returns
- Return type
- termination(observation, prev_action, reward, reward_spec=TensorSpec(shape=(), dtype=torch.float32), env_id=None, env_info={})[source]#
Returns a
TimeStepwithstep_typeset toStepType.LAST.Called by
env.step()if ‘Done’.discountshould not be sent in and will be set as 0.- Parameters
observation (nested tensors) – current observations of the env.
prev_action (nested tensors) – previous actions to the the env.
reward (float) – A scalar, or 1D NumPy array, or tensor.
reward_spec (TensorSpec) – a rank-1 or rank-0 (default) tensor spec. Used to tell if the termination is batched or not.
env_id (torch.int32) – (optional) A scalar or 1D tensor of the environment ID(s).
env_info (dict) – extra info returned by the environment.
- Returns
- Return type
- Raises
ValueError – If observations are tensors but reward’s statically known rank is not 0 or 1.
- time_step_spec(observation_spec, action_spec, reward_spec)[source]#
Returns a
TimeStepspec given theobservation_specand theaction_spec.
- transition(observation, prev_action, reward, reward_spec=TensorSpec(shape=(), dtype=torch.float32), discount=1.0, env_id=None, env_info={})[source]#
Returns a
TimeStepwithstep_typeset equal toStepType.MID.Called by
env.step()if not ‘Done’.The batch size is inferred from the shape of
reward.If
discountis a scalar, andobservationcontains tensors, thendiscountwill be broadcasted to matchreward.shape.- Parameters
observation (nested tensors) – current observations of the env.
prev_action (nested tensors) – previous actions to the the env.
reward (float) – A scalar, or 1D NumPy array, or tensor.
reward_spec (TensorSpec) – a rank-1 or rank-0 (default) tensor spec. Used to tell if the transition is batched or not.
discount (float) – (optional) A scalar, or 1D NumPy array, or tensor.
env_id (torch.int32) – (optional) A scalar or 1D tensor of the environment ID(s).
env_info (dict) – extra info returned by the environment.
- Returns
- Return type
- Raises
ValueError – If observations are tensors but reward’s rank
is not 0 or 1. –
alf.device_ctx#
alf.initializers#
- variance_scaling_init(tensor, gain=1.0, mode='fan_in', distribution='truncated_normal', calc_gain_after_activation=True, nonlinearity=<function identity>, transposed=False)[source]#
Implements TensorFlow’s VarianceScaling initializer.
A potential benefit of this intializer is that we can sample from a truncated normal distribution:
scipy.stats.truncnorm(a=-2, b=2, loc=0., scale=1.).Also incorporates PyTorch’s calculation of the recommended gains that taking nonlinear activations into account, so that after N layers, the final output std (in linear space) will be a constant regardless of N’s value (when N is large). This auto gain probably won’t make much of a difference if the network is shallow, as in most RL cases.
Example usage:
from alf.networks.initializers import variance_scaling_init layer = nn.Linear(2, 2) variance_scaling_init(layer.weight.data, nonlinearity=nn.functional.leaky_relu) nn.init.zeros_(layer.bias.data)
- Parameters
tensor (torch.Tensor) – the weights to be initialized
gain (float) – a positive scaling factor for weight std. Different from tf’s implementation, this number is applied outside of
math.sqrt. Note that ifcalc_gain_after_activation=True, this number will be an additional gain factor on top of that.mode (str) – one of “fan_in”, “fan_out”, and “fan_avg”
distribution (str) – one of “uniform”, “untruncated_normal” and “truncated_normal”. If the latter, the weights will be sampled from a normal distribution truncated at
(-2, 2).calc_gain_after_activation (bool) – whether automatically calculate the std gain of applying nonlinearity after this layer. A nonlinear activation (e.g., relu) might change std after the transformation, so we need to compensate for that. Only used when mode==”fan_in”.
nonlinearity (Callable) – any callable activation function
transposed (bool) – a flag indicating if the weight tensor has been tranposed (e.g.,
nn.ConvTranspose2d). In that case, fan_in and fan_out should be swapped.
- Returns
a randomly initialized weight tensor
- Return type
torch.Tensor
alf.layers#
Some basic layers.
- class AMPWrapper(enabled, net)[source]#
Bases:
torch.nn.modules.module.ModuleWrap a layer to run in a given AMP context.
- Parameters
enabled (
bool) – whether to enable AMP autocastnet (
Module) – the wrapped network
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class AddN[source]#
Bases:
alf.layers.ElementwiseLayerBaseAdd several tensors
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
- Parameters
input (Iterable[Tensor]) – a sequence of tensors to be summed
- Returns
the sum of all the tensors
- Return type
Tensor
- training: bool#
- class BottleneckBlock(in_channels, kernel_size, filters, stride, transpose=False, v1_5=True, with_batch_normalization=True, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]#
Bases:
torch.nn.modules.module.ModuleBottleneck block for ResNet.
We allow two slightly different architectures:
v1: Placing the stride at the first 1x1 convolution as described in the original ResNet paper Deep residual learning for image recognition.
v1.5: Placing the stride for downsampling at 3x3 convolution. This variant is also known as ResNet V1.5 and improves accuracy according to https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.
TODO:
ResNet-D in Bag of Tricks for Image Classification with Convolutional Neural Networks Note: v1_5 is the ResNet-B in the above paper.
Squeeze-and-Excitation (SE) in Squeeze-and-Excitation Networks SE is also shown to be useful in Revisiting ResNets: Improved Training and Scaling Strategies
- Parameters
kernel_size (int) – the kernel size of middle layer at main path
filters (int) – the filters of 3 layer at main path
stride (int) – stride for this block.
transpose (bool) – a bool indicate using
Conv2DorConv2DTranspose. If two BottleneckBlock layersLandLTare constructed with the same arguments excepttranspose, it is guaranteed thatLT(L(x)).shape == x.shapeifx.shape[-2:]can be divided bystride.v1_5 (bool) – whether to use the ResNet V1.5 structure
with_batch_normalization (bool) – whether to include batch normalization. Note that standard ResNet uses batch normalization.
bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.
- forward(inputs)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class Branch(*modules, **named_modules)[source]#
Bases:
torch.nn.modules.module.ModuleApply multiple modules on the same input.
Example:
net = Branch((module1, module2)) y = net(x)
is equivalent to the following:
y = module1(x), module2(x)
- Parameters
modules (nested nn.Module) – a nest of
torch.nn.Module. Note thatBranch(module_a, module_b)is equivalent toBranch((module_a, module_b))named_modules (nn.Module | Callable) – a simpler way of specifying a dict of modules.
Branch(a=model_a, b=module_b)is equivalent toBranch(dict(a=module_a, b=module_b))
- forward(inputs)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- make_parallel(n)[source]#
Create a parallelized version of this network.
- Parameters
n (int) – the number of copies
- Returns
the parallelized version of this network
- training: bool#
- class Cast(dtype=torch.float32)[source]#
Bases:
alf.layers.ElementwiseLayerBaseA layer that cast the dtype of the elements of the input tensor.
- Parameters
dtype (torch.dtype) – desired type of the new tensor.
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class CausalConv1D(in_channels, out_channels, kernel_size, dilation=1, hide_current=False, activation=<built-in method relu_ of type object>, use_bias=None, use_bn=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.Module1D (Dilated) Causal Convolution layer. 1D Dilated Causal Convolution is proposed in Aaron et al. WaveNet: A generative model for raw audio
A layer implementing the 1D (Dilated) Causal Convolution. It is also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the causal conv1d layer.
Note that the main difference of causal conv v.s. standard conv is that each temporal element in the convolutional output is causal w.r.t. the temporal elements from input. For example, for a length
Lsequencexwith the shape of[B, C, L], andy = causal_conv(x), where the shape ofyis[B, C', L], by causal we meany[..., l]only depends onX[..., :l](i.e. the past), and there is no dependency onX[..., l:](i.e. future) as in the standard non-causal convolution.This can implemented by using an asymmetric padding, which in effect shift the input to the right (future) according to kernel size.
- Parameters
in_channels (int) – channels of the input
out_channels (int) – channels of the output
kernel_size (int) – size of the kernel
dilation (int) – controls the spacing between the kernel points. Please refer to here for a visual illustration: https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
hide_current (bool) – whether to hide the current by shifting the input to the right (future) by one. This is typically needed in the first layer of a causal conv net.
activation (torch.nn.functional) – activation to be applied to output
use_bias (bool|None) – whether use bias. If None, will use
not use_bnuse_bn (bool) – whether use batch normalization
kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as
kernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
- forward(x)[source]#
- Parameters
x (tensor) – input of the shape [B, C, L] where B is the batch size, C denotes the number of input channels, and L is the length of the signal.
- Returns
- A tensor of the shape [B, C’, L], where C’ denotes the number of
output channels.
- training: bool#
- property weight#
- class CompositionalFC(input_size, output_size, n, activation=<function identity>, output_comp_weight=True, use_bias=True, use_bn=False, use_ln=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.ModuleCompositional FC layer.
It maintains a set of
nFC parameters for learning. During forward computation, it composes the set of parameters using weighted average with the compositional weight provided as input and then performs the FC computation, which is equivalent to combine the pre-activation output from each of thenFC layers using the compositional weight, and then apply normalization and activation.- Parameters
input_size (int) – input size
output_size (int) – output size
n (int) – the size of the paramster set
activation (torch.nn.functional) –
output_comp_weight (bool) – If True, the forward() function will return a tuple of (result, comp_weight) for easy chaining of multiple layers in the case when the same compsitional weight is used. If False, the forward() function will return result only.
use_bias (bool) – whether use bias
use_bn (bool) – whether use Batch Normalization.
use_ln (bool) – whether use layer normalization
kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
Get the bias Tensor.
- Returns
- with shape (n, output_size).
bias[i]is the bias for the i-th FC layer.
bias[i]can be used forFClayer with the sameinput_sizeandoutput_size
- with shape (n, output_size).
- Return type
Tensor
- forward(inputs)[source]#
Forward
- Parameters
inputs (torch.Tensor|tuple) – If a Tensor, its shape should be
input_size] If a tuple, it should contain two elements. ([B,) –
first is a Tensor with the shape of [B, input_size], the (The) –
is a compositional weight Tensor with the shape of [B, n] (second) –
None. If the compositional weight is not specified (i.e. when (or) –
is not a tuple) or None, a uniform weight of one wil be used. (inputs) –
- Returns
torch.Tensor representing the final activation with shape
[B, output_size]ifoutput_comp_weightis False. Otherwise, return a tuple consisted of the final activation and the compositional weight used.
- training: bool#
- property weight#
Get the weight Tensor.
- Returns
- with shape (n, output_size, input_size).
weight[i]is the weight for the i-th FC layer.
weight[i]can be used forFClayer with the sameinput_sizeandoutput_size
- with shape (n, output_size, input_size).
- Return type
Tensor
- class Conv2D(in_channels, out_channels, kernel_size, activation=<built-in method relu_ of type object>, strides=1, padding=0, use_bias=None, use_bn=False, use_ln=False, weight_opt_args=None, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.Module2D Convolution Layer.
A 2D Conv layer that’s also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the conv layer. Suggest using this wrapper module instead of
nn.Conv2dif you really care about weight std after init.- Parameters
in_channels (int) – channels of the input image
out_channels (int) – channels of the output image
kernel_size (int or tuple) –
activation (torch.nn.functional) –
strides (int or tuple) –
padding (int or tuple) –
use_bias (bool|None) – whether use bias. If None, will use
not use_bnuse_bn (bool) – whether use batch normalization
use_ln (bool) – whether use layer normalization
weight_opt_args (
Optional[Dict]) – optimizer arguments for weight (not for bias)bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as
kernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
- forward(img)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- property weight#
- class Conv2DBatchEnsemble(in_channels, out_channels, kernel_size, ensemble_size, output_ensemble_ids=True, activation=<built-in method relu_ of type object>, strides=1, padding=0, use_bias=None, use_bn=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_range=0.0, ensemble_group=0)[source]#
Bases:
alf.layers.Conv2DThe BatchEnsemble for 2D Conv layer.
BatchEnsemble is proposed in Wen et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
In a nutshell, a tuple of vector \((r_k, s_k)\) is maintained for ensemble member k in addition to the conv2d kernel W of shape
[C_out, C_in, K_h, K_w]. For input x of shape[B, C, H, W], the result for ensemble member k is calculated as \((W \circ (s_k r_k^T).unsqueeze(-1).unsqueeze(-1)) * x\). This can be more efficiently calculated as\((W*(x \circ r_k.unsqueeze(-1).unsqueeze(-1))) \circ s_k.unsqueeze(-1).unsqueeze(-1)\)
Note that for each sample in a batch, a random ensemble member will used for it if
ensemble_idsis not provided toforward().- Parameters
in_channels (int) – channels of the input image
out_channels (int) – channels of the output image
kernel_size (int or tuple) –
ensemble_size (int) – ensemble size
output_ensemble_ids (bool) – If True, the forward() function will return a tuple of (result, ensemble_ids). If False, the forward() function will return result only.
activation (torch.nn.functional) –
strides (int or tuple) –
padding (int or tuple) –
use_bias (bool|None) – whether use bias. If None, will use
not use_bnuse_bn (bool) – whether use batch normalization
kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as
kernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_range (float) – biases are initialized uniformly in [-bias_init_range, bias_init_range]
ensemble_group (int) –
the extra attribute
ensemble_groupadded toself._r,self._s, andself._ensemble_bias, default value is 0. For alf.optimizers whoseparviis notNone, all parameters with the sameensemble_groupwill be updated by the particle-based VI algorithm specified byparvi, options are [svgd,gfsf],Stein Variational Gradient Descent (SVGD)
Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” NIPS. 2016.
Wasserstein Gradient Flow with Smoothed Functions (GFSF)
Liu, Chang, et al. “Understanding and accelerating particle-based variational inference.” ICML, 2019.
- forward(inputs)[source]#
Forward computation.
- Parameters
inputs (Tensor|tuple) – if a Tensor, its shape should be
[B, C, H, W]. And a random ensemble id will be generated for each sample in the batch. If a tuple, it should contain two tensors. The first one is the data tensor with shape[B, C, H, W]. The second one is ensemble_ids indicating which ensemble member each sample should use. Its shape should be [batch_size], and all elements should be in [0, ensemble_size).- Returns
tuple if
output_ensemble_idsis True, - Tensor: with shape[B, C_out, H_out, W_out]- LongTensor: if enseble_ids is provided, this is same asensemble_ids,otherwise a randomly generated ensemble_ids is returned
Tensor if
output_ensemble_idsis False. The result of Conv2D.
- training: bool#
- class ConvTranspose2D(in_channels, out_channels, kernel_size, activation=<built-in method relu_ of type object>, strides=1, padding=0, output_padding=0, use_bias=None, use_bn=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.ModuleA 2D ConvTranspose layer that’s also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the conv layer. Suggest using this wrapper module instead of
nn.ConvTranspose2dif you really care about weight std after init.- Parameters
in_channels (int) – channels of the input image
out_channels (int) – channels of the output image
kernel_size (int or tuple) –
activation (torch.nn.functional) –
strides (int or tuple) –
padding (int or tuple) –
output_padding (int or tuple) – Additional size added to one side of each dimension in the output shape. Default: 0. See pytorch documentation for more detail.
use_bias (bool|None) – If None, will use
not use_bnuse_bn (bool) – whether use batch normalization
bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.kernel_initializer (Callable) – initializer for the conv_trans layer. If None is provided a variance_scaling_initializer with gain as
kernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
- forward(img)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- property weight#
- class Detach[source]#
Bases:
alf.layers.ElementwiseLayerBaseDetach nested Tensors.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class ElementwiseLayerBase[source]#
Bases:
torch.nn.modules.module.ModuleBase class for the layers of parameterless elementwise operations.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- make_parallel(n)[source]#
Create a layer with same operation to handle parallel batch.
It is assumed that a parallel batch has shape [B, n, …].
- Parameters
n (int) – the number of replicas.
- Returns
a layer with same operation to handle parallel batch.
- training: bool#
- class FC(input_size, output_size, activation=<function identity>, use_bias=True, use_bn=False, use_ln=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm1d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0, bias_initializer=None, weight_opt_args=None, bias_opt_args=None)[source]#
Bases:
torch.nn.modules.module.ModuleFully connected layer.
A fully connected layer that’s also responsible for activation and customized weights initialization. An auto gain calculation might depend on the activation following the linear layer. Suggest using this wrapper module instead of
nn.Linearif you really care about weight std after init.- Parameters
input_size (int) – input size
output_size (int) – output size
activation (torch.nn.functional) –
use_bias (bool) – whether use bias
use_bn (bool) – whether use batch normalization.
use_ln (bool) – whether use layer normalization
bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant for the initial bias value. This is ignored if
bias_initializeris provided.bias_initializer (Callable) – initializer for the bias parameter.
weight_opt_args (
Optional[Dict]) – optimizer arguments for weightbias_opt_args (
Optional[Dict]) – optimizer arguments for bias
- property bias#
- forward(inputs)[source]#
Forward computation.
- Parameters
inputs (Tensor) – its shape should be
[batch_size, input_size]or[batch_size, ..., input_size]- Returns
with shape as
inputs.shape[:-1] + (output_size,)- Return type
Tensor
- property input_size#
- make_parallel(n)[source]#
Create a
ParallelFCusingnreplicas ofself. The initialized layer parameters will be different.
- property output_size#
- training: bool#
- property weight#
- class FCBatchEnsemble(input_size, output_size, ensemble_size, output_ensemble_ids=True, activation=<function identity>, use_bias=True, use_bn=False, use_ln=False, kernel_initializer=None, kernel_init_gain=1.0, bias_init_range=0.0, ensemble_group=0)[source]#
Bases:
alf.layers.FCThe BatchEnsemble for FC layer.
BatchEnsemble is proposed in Wen et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning
In a nutshell, a tuple of vector \((r_k, s_k)\) is maintained for ensemble member k in addition to the original FC weight matrix w. For input x, the result for ensemble member k is calculated as \((W \circ (s_k r_k^T)) x\). This can be more efficiently calculated as \((W (x \circ r_k)) \circ s_k\). Note that for each sample in a batch, a random ensemble member will used for it if
ensemble_idsis not provided toforward().- Parameters
input_size (int) – input size
output_size (int) – output size
ensemble_size (int) – ensemble size
output_ensemble_ids (bool) – If True, the forward() function will return a tuple of (result, ensemble_ids). If False, the forward() function will return result only.
activation (Callable) – activation function
use_bias (bool) – whether use bias
use_bn (bool) – whether use batch normalization.
use_ln (bool) – whether use layer normalization
kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_range (float) – biases are initialized uniformly in [-bias_init_range, bias_init_range]
ensemble_group (int) –
the extra attribute
ensemble_groupadded toself._r,self._s, andself._ensemble_bias, default value is 0. For alf.optimizers whoseparviis notNone, all parameters with the sameensemble_groupwill be updated by the particle-based VI algorithm specified byparvi, options are [svgd,gfsf],Stein Variational Gradient Descent (SVGD)
Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.” NIPS. 2016.
Wasserstein Gradient Flow with Smoothed Functions (GFSF)
Liu, Chang, et al. “Understanding and accelerating particle-based variational inference.” ICML, 2019.
- forward(inputs)[source]#
Forward computation.
- Parameters
inputs (Tensor|tuple) – if a Tensor, its shape should be
[batch_size, input_size]or[batch_size, ..., input_size]. And a random ensemble id will be generated for each sample in the batch. If a tuple, it should contain two tensors. The first one is the data tensor with shape[batch_size, input_size]or[batch_size, ..., input_size]. The second one is ensemble_ids indicating which ensemble member each sample should use. Its shape should be [batch_size], and all elements should be in [0, ensemble_size).- Returns
tuple if
output_ensemble_idsis True, - Tensor: with shape asinputs.shape[:-1] + (output_size,)- LongTensor: if enseble_ids is provided, this is same asensemble_ids,otherwise a randomly generated ensemble_ids is returned
Tensor if
output_ensemble_idsis False. The result of FC.
- training: bool#
- class FixedDecodingLayer(input_size, output_size, basis_type='rbf', sigma=1.0, tau=0.5)[source]#
Bases:
torch.nn.modules.module.ModuleA layer that uses a set of fixed basis for decoding the inputs.
- Parameters
input_size (int) – the size of input to be decoded, representing the number of representation coefficients
output_size (int) – the size of the decoded output
basis_type (str) – the type of basis to be used for decoding - “poly”: polynomial basis using Vandermonde matrix - “cheb”: polynomial basis using Chebyshev polynomials - “rbf”: radial basis functions - “haar”: Haar wavelet basis
sigma (float) – the bandwidth parameter used for RBF basis. If None, a default value of 1. will be used.
tau (float) – a factor for weighting the basis exponentially according to the order (
n) of the basis, i.e.,tau**n`
- forward(inputs)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- property weight#
- class GFT(num_transformations, image_channels, language_dim)[source]#
Bases:
torch.nn.modules.module.ModuleGuided Feature Transformation.
This class implements the GFT model proposed in the following paper:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
- Parameters
input (tuple) – the tuple of image features and sentence embedding.
- Returns
same shape as input[0]
- Return type
Tensor
- training: bool#
- class GetFields(field_nest=None, **fields)[source]#
Bases:
alf.layers.ElementwiseLayerBaseGet the fields from a nested input.
- Args
- field_nest (nested str): the path of the fields to be retrieved. Each str
in
fieldsrepresents a path to the field with ‘.’ separating the field name at different level.- fields (str): A simpler way of specifying
field_nestwhen it is a dict.
GetFields(a="field_a", b="field_b")is equivalent toGetFields(dict(a="field_a", b="field_b")).
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class Identity[source]#
Bases:
alf.layers.ElementwiseLayerBaseA layer that simply returns its argument as result.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class Lambda(func)[source]#
Bases:
torch.nn.modules.module.ModuleWrap a function as an nn.Module.
- Parameters
func (Callable) – a function that calculate the output given the input. It should be parameterless.
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class NaiveParallelLayer(module, n)[source]#
Bases:
torch.nn.modules.module.ModuleA parallel network has
ncopies of network with the same structure but different indepently initialized parameters.NaiveParallelLayercreatesnindependent networks with the same structure asnetworkand evaluate them separately in a loop duringforward().- Parameters
module (nn.Module | Callable) – the parallel network will have
n` copies of ``module.n (int) –
ncopies ofmodule
- forward(inputs)[source]#
Compute the output.
- Parameters
inputs (nested torch.Tensor) – its shape is
[B, n, ...]- Returns
its shape is
[B, n, ...]- Return type
output (nested torch.Tensor)
- training: bool#
- class OneHot(num_classes)[source]#
Bases:
torch.nn.modules.module.ModuleInitializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class ParallelConv2D(in_channels, out_channels, kernel_size, n, activation=<built-in method relu_ of type object>, strides=1, padding=0, use_bias=None, use_bn=False, use_ln=False, weight_opt_args=None, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.ModuleA parallel 2D Conv layer that can be used to perform n independent 2D convolutions in parallel.
It is equivalent to
nseparateConv2Dlayers with the samein_channelsandout_channels.- Parameters
in_channels (int) – channels of the input image
out_channels (int) – channels of the output image
kernel_size (int or tuple) –
n (int) – n independent
Conv2Dlayersactivation (torch.nn.functional) –
strides (int or tuple) –
padding (int or tuple) –
use_bias (bool|None) – whether use bias. If None, will use
not use_bnuse_bn (bool) – whether use batch normalization
use_ln (bool) – whether use layer normalization
weight_opt_args (
Optional[Dict]) – optimizer arguments for weight (not for bias)bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
- forward(img)[source]#
Forward
- Parameters
img (torch.Tensor) –
- with shape
[B, C, H, W] or
[B, n, C, H, W]
where the meaning of the symbols are: -
B: batch size -n: number of replicas -C: number of channels -H: image height -W: image width. When the shape of img is[B, C, H, W], all the n 2D Conv operations will take img as the same shared input. When the shape of img is[B, n, C, H, W], each 2D Conv operator will have its own input data by slicing img.- with shape
- Returns
- torch.Tensor with shape
[B, n, C', H', W'] where the meaning of the symbols are: -
B: batch -n: number of replicas -C': number of output channels -H': output height -W': output width
- torch.Tensor with shape
- training: bool#
- property weight#
- class ParallelConvTranspose2D(in_channels, out_channels, kernel_size, n, activation=<built-in method relu_ of type object>, strides=1, padding=0, output_padding=0, use_bias=None, use_bn=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.ModuleA parallel ConvTranspose2D layer that can be used to perform n independent 2D transposed convolutions in parallel.
- Parameters
in_channels (int) – channels of the input image
out_channels (int) – channels of the output image
kernel_size (int or tuple) –
n (int) – n independent
ConvTranspose2Dlayersactivation (torch.nn.functional) –
strides (int or tuple) –
padding (int or tuple) –
output_padding (int or tuple) – Additional size added to one side of each dimension in the output shape. Default: 0. See pytorch documentation for more detail.
use_bias (bool|None) – If None, will use
not use_bnuse_bn (bool) –
bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.kernel_initializer (Callable) – initializer for the conv_trans layer. If None is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
- forward(img)[source]#
Forward
- Parameters
img (torch.Tensor) –
- with shape
[B, C, H, W] or
[B, n, C, H, W]
where the meaning of the symbols are: -
B: batch size -n: number of replicas -C: number of channels -H: image height -W: image width. When the shape of img is[B, C, H, W], all the n transposed 2D Conv operations will take img as the same shared input. When the shape of img is[B, n, C, H, W], each transposed 2D Conv operator will have its own input data by slicing img.- with shape
- Returns
- torch.Tensor with shape
[B, n, C', H', W'] where the meaning of the symbols are: -
B: batch -n: number of replicas -C': number of output channels -H': output height -W': output width
- torch.Tensor with shape
- training: bool#
- property weight#
- class ParallelFC(input_size, output_size, n, activation=<function identity>, use_bias=True, use_bn=False, use_ln=False, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm1d'>, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0, bias_initializer=None, weight_opt_args=None, bias_opt_args=None)[source]#
Bases:
torch.nn.modules.module.ModuleParallel FC layer.
It is equivalent to
nseparate FC layers with the sameinput_sizeandoutput_size.- Parameters
input_size (int) – input size
output_size (int) – output size
n (int) – n independent
FClayersactivation (torch.nn.functional) –
use_bn (bool) – whether use Batch Normalization.
use_ln (bool) – whether use layer normalization
bn_ctor (Callable) – will be called as
bn_ctor(num_features)to create the BN layer.use_bias (bool) – whether use bias
kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant for the initial bias value. This is ignored if
bias_initializeris provided.bias_initializer (Callable) – initializer for the bias parameter.
weight_opt_args (
Optional[Dict]) – optimizer arguments for weightbias_opt_args (
Optional[Dict]) – optimizer arguments for bias
- property bias#
Get the bias Tensor.
- Returns
- with shape (n, output_size).
bias[i]is the bias for the i-th FC layer.
bias[i]can be used forFClayer with the sameinput_sizeandoutput_size
- with shape (n, output_size).
- Return type
Tensor
- forward(inputs)[source]#
Forward
- Parameters
inputs (torch.Tensor) – with shape
[B, n, input_size]or[B, input_size]- Returns
torch.Tensor with shape
[B, n, output_size]
- training: bool#
- property weight#
Get the weight Tensor.
- Returns
- with shape (n, output_size, input_size).
weight[i]is the weight for the i-th FC layer.
weight[i]can be used forFClayer with the sameinput_sizeandoutput_size
- with shape (n, output_size, input_size).
- Return type
Tensor
- class ParamConv2D(in_channels, out_channels, kernel_size, activation=<built-in method relu_ of type object>, strides=1, pooling_kernel=None, padding=0, use_bias=False, use_ln=False, n_groups=None, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.ModuleA 2D conv layer that does not maintain its own weight and bias, but accepts both from users. If the given parameter (weight and bias) tensor has an extra batch dimension (first dimension), it performs parallel FC operation.
- Parameters
in_channels (int) – channels of the input image
out_channels (int) – channels of the output image
kernel_size (int or tuple) –
activation (torch.nn.functional) –
strides (int or tuple) –
pooling_kernel (int or tuple) –
padding (int or tuple) –
use_bias (bool) – whether use bias.
use_ln (bool) – whether use layer normalization
n_groups (int) – number of parallel groups, it is determined by the first dimension of the input parameters when calling
set_parametersifuse_lnis False. Ifuse_lnis True,n_groupsmust be specified at initialization and will be fixed, all input parameters will have to be consistent with it.kernel_initializer (Callable) – initializer for the conv layer kernel. If None is provided a variance_scaling_initializer with gain as
kernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
Get stored bias tensor or batch of bias tensors.
- property bias_length#
Get the n_element of a single bias tensor.
- forward(img, keep_group_dim=True)[source]#
Forward
- Parameters
img (torch.Tensor) –
- with shape
[B, C, H, W] (groups=1) or
[B, n, C, H, W] (groups=n)
where the meaning of the symbols are: -
B: batch size -n: number of replicas -C: number of channels -H: image height -W: image width. When the shape of img is[B, C, H, W], all the n 2D Conv operations will take img as the same shared input. When the shape of img is[B, n, C, H, W], each 2D Conv operator will have its own input data by slicing img.- with shape
- Returns
torch.Tensor with shape
[B, n, C', H', W']ifkeep_group_dimotherwise with shape[B, n*C', H', W'],where the meaning of the symbols are: -
B: batch -n: number of replicas -C': number of output channels -H': output height -W': output width
- property param_length#
Get total number of parameters for all layers.
- set_parameters(theta, reinitialize=False)[source]#
Distribute parameters to corresponding parameters.
- Parameters
theta (torch.Tensor) –
- with shape
[D] (groups=1) or
[B, D] (groups=B)
where the meaning of the symbols are: -
B: batch size -D: length of parameters, should be self.param_length When the shape of inputs is[D], it will be unsqueezed to[1, D].- with shape
reinitialize (bool) – whether to reinitialize parameters of each layer.
- training: bool#
- property weight#
Get stored weight tensor or batch of weight tensors.
- property weight_length#
Get the n_element of a single weight tensor.
- class ParamFC(input_size, output_size, activation=<built-in method relu_ of type object>, use_bias=True, use_ln=False, n_groups=None, kernel_initializer=None, kernel_init_gain=1.0, bias_init_value=0.0)[source]#
Bases:
torch.nn.modules.module.ModuleA fully connected layer that does not maintain its own weight and bias, but accepts both from users. If the given parameter (weight and bias) tensor has an extra batch dimension (first dimension), it performs parallel FC operation.
- Parameters
input_size (int) – input size
output_size (int) – output size
activation (torch.nn.functional) –
use_bias (bool) – whether use bias
use_ln (bool) – whether use layer normalization
n_groups (int) – number of parallel groups, it is determined by the first dimension of the input parameters when calling
set_parametersifuse_lnis False. Ifuse_lnis True,n_groupsmust be specified at initialization and will be fixed, all input parameters will have to be consistent with it.kernel_initializer (Callable) – initializer for the FC layer kernel. If none is provided a
variance_scaling_initializerwith gain askernel_init_gainwill be used.kernel_init_gain (float) – a scaling factor (gain) applied to the std of kernel init distribution. It will be ignored if
kernel_initializeris not None.bias_init_value (float) – a constant
- property bias#
Get stored bias tensor or batch of bias tensors.
- property bias_length#
Get the n_element of a single bias tensor.
- forward(inputs)[source]#
Forward
- Parameters
inputs (torch.Tensor) –
with shape
[B, D] (groups=1)or[B, n, D] (groups=n)where the meaning of the symbols are:B: batch size
n: number of replicas
D: input dimension
When the shape of inputs is
[B, D], all the n linear operations will take inputs as the same shared inputs. When the shape of inputs is[B, n, D], each linear operator will have its own input data by slicing inputs.- Returns
- with shape
[B, n, D]or[B, D] where the meaning of the symbols are:
B: batch
n: number of replicas
D: output dimension
- with shape
- Return type
torch.Tensor
- property param_length#
Get total number of parameters for all layers.
- set_parameters(theta, reinitialize=False)[source]#
Distribute parameters to corresponding parameters. :param theta:
- with shape
[D] (groups=1) or
[B, D] (groups=B)
where the meaning of the symbols are: -
B: batch size -D: length of parameters, should be self.param_length When the shape of inputs is[D], it will be unsqueezed to[1, D].- Parameters
reinitialize (bool) – whether to reinitialize parameters of each layer.
- with shape
- training: bool#
- property weight#
Get stored weight tensor or batch of weight tensors.
- property weight_length#
Get the n_element of a single weight tensor.
- class Permute(*dims)[source]#
Bases:
torch.nn.modules.module.ModuleA layer that perform the permutation of channels.
- Parameters
*dims – The desired ordering of dimensions (not including batch dimension)
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- make_parallel(n)[source]#
Create a Permute layer to handle parallel batch.
It is assumed that a parallel batch has shape [B, n, …] and both the batch dimension and replica dimension are not considered for permute.
- Parameters
n (int) – the number of replicas.
- Returns
a
Permutelayer to handle parallel batch.
- training: bool#
- class RandomCrop(size, padding=0)[source]#
Bases:
torch.nn.modules.module.ModulePerform random crop independently for each image in the batch.
Note that
torchvision.transforms.RandomCropis different in that it applies the same random crop for all the images in the batch.Each result image is a random crop of the padded input image. The padded pixels are from the neareat pixel from the boundary.
- Parameters
size (
Union[int,Tuple[int]]) – a tuple of desired height and width. If is int, uses the same height and width.padding (
Union[int,Tuple[int]]) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)).
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
- Parameters
input (
Tensor) – shape is [B, C, H, W]- Return type
Tensor- Returns
a tensor of shape [B, C, h, w], where
h, w=size
- training: bool#
- class ReplicationPad2d(padding)[source]#
Bases:
torch.nn.modules.module.ModulePad the input tensor using replication of the input boundary.
For N-dimensional padding, use
torch.nn.functional.pad().This is same as torch.nn.ReplicationPad2d except that this implementation can handle input of any dtype, while torch.nn one can only handle float dtype.
- Parameters
padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\))
- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\) where
\(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\)
\(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class Reshape(*shape)[source]#
Bases:
torch.nn.modules.module.ModuleA layer for reshape the tensor.
The result of this layer is a tensor reshaped to
(B, *shape)whereBisx.shape[0]- Parameters
shape (tuple of ints|int...) – desired shape not including the batch dimension.
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class ResidueBlock(in_channels, channels, kernel_size, stride, transpose=False, activation=ReLU(inplace=True), with_batch_normalization=True, weight_opt_args=None, bn_ctor=<class 'torch.nn.modules.batchnorm.BatchNorm2d'>)[source]#
Bases:
torch.nn.modules.module.ModuleThe ResidueBlock for ResNet.
This is the residual block used in ResNet-18 and ResNet-34 of the original ResNet paper Deep residual learning for image recognition.
Compared to BottleneckBlock, it has one less conv layer.
- Parameters
in_channels (
int) – the number of channels of inputkernel_size (
Union[int,Tuple[int,int]]) – the kernel size of middle layer at main pathfilters – the number of filters of the two conv layers at main path
stride (
Union[int,Tuple[int,int]]) – stride for this block.transpose (
bool) – whether useConv2DorConv2DTranspose. If twoResidueBlocklayersLandLTare constructed with the same arguments excepttranspose, it is guaranteed thatLT(L(x)).shape == x.shapeifx.shape[-2:]can be divided bystride.activation (
Module) – activation function.with_batch_normalization (
bool) – whether to include batch normalization. Note that standard ResNet uses batch normalization.weight_opt_args (
Optional[Dict]) – optimizer arguments for weights (not for bias)bn_ctor (
Callable[[int],Module]) – will be called asbn_ctor(num_features)to create the BN layer.
- forward(inputs)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
- training: bool#
- class Scale(scale)[source]#
Bases:
alf.layers.ElementwiseLayerBaseInitializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class ScaleGradient(scale)[source]#
Bases:
alf.layers.ElementwiseLayerBaseScales the gradient of input for the backward pass.
- Parameters
scale (float) – a scalar factor to be multiplied to the gradient of tensor.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class Sequential(*modules, output='', **named_modules)[source]#
Bases:
torch.nn.modules.module.ModuleA more flexible Sequential than torch.nn.Sequential.
alf.layers.Sequentialis similar toalf.nn.Sequential, but does not accept statefulalf.nn.Networkas its elements.All the modules provided through
modulesandnamed_modulesare calculated sequentially in the same order as they appear in the call toSequential. Typically, each module takes the result of the previous module as its input (or the input to the Sequential if it is the first module), and the result of the last module is the output of the Sequential. But we also allow more flexibilities as shown in example 2.Example 1:
net = Sequential(module1, module2) y = net(x)
is equivalent to the following:
z = module1(x) y = module2(z)
Example 2:
net = Sequential( module1, a=module2, b=(('input', 'a'), module3), output=('a', 'b')) output = net(input, state)
is equivalent to the following:
_ = module1(input) a = module2(_) b = module3((input, a)) output = (a, b)
- Parameters
modules (Callable | (nested str, Callable)) – The
Callablecan be atorch.nn.Module, statelessalf.nn.Networkor plainCallable. Optionally, their inputs can be specified by the first element of the tuple. If input is not provided, it is assumed to be the result of the previous module (or input to thisSequentialfor the first module). If input is provided, it should be a nested str. It will be used to retrieve results from the dictionary of the currentnamed_results. For modules specified bymodules, because nonamed_moduleshas been invoked,named_resultsis{'input': input}.named_modules (Callable | (nested str, Callable)) – The
Callablecan be atorch.nn.Module, statelessalf.nn.Networkor plainCallable. Optionally, their inputs can be specified by the first element of the tuple. If input is not provided, it is assumed to be the result of the previous module (or input to thisSequentialfor the first module). If input is provided, it should be a nested str. It will be used to retrieve results from the dictionary of the currentnamed_results.named_resultsis updated once the result of a named module is calculated.output (nested str) – if not provided, the result from the last module will be used as output. Otherwise, it will be used to retrieve results from
named_resultsafter the results of all modules have been calculated.
- make_parallel(n)[source]#
Create a parallelized version of this network.
- Parameters
n (int) – the number of copies
- Returns
the parallelized version of this network
- training: bool#
- class SimpleAttention[source]#
Bases:
torch.nn.modules.module.ModuleSimple Attention Module.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(query, key, value)[source]#
Simple attention computation based on the inputs. :param query: shape [B, head, M, d] :type query: Q :param key: shape [B, head, N, d] :type key: K :param value: shape [B, head, N, d] :type value: V :param where B denotes the batch size, head denotes the number of heads,: :param N the number of entities, and d the feature dimension.:
- Returns
- softmax(QK^T/sqrt(d))V,
with the shape [B, head, M, d]
the attention weight, with the shape [B, head, M, N]
- Return type
the attended results computed as
- training: bool#
- class Sum(dim)[source]#
Bases:
torch.nn.modules.module.ModuleSum over given dimension(s).
Note that batch dimension is not counted for dim. This means that dim=0 means the dimension after batch dimension.
- Parameters
dim (int|tuple[int]) – the dimension(s) to be summed.
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- make_parallel(n)[source]#
Create a Sum layer to handle parallel batch.
It is assumed that a parallel batch has shape [B, n, …] and both the batch dimension and replica dimension are not counted for
dim- Parameters
n (int) – the number of replicas.
- Returns
a
Sumlayer to handle parallel batch.
- training: bool#
- class SummarizeGradient(name)[source]#
Bases:
alf.layers.ElementwiseLayerBaseA layer for summarizing the gradient of the input tensor.
Summarize the gradient of the input tensor. Always first cloning the input tensor and then setting
requires_grad=Truefor the cloned tensor to enable gradient calculation for summarization.- Args:
- name (str): used to describe the name of the summary, after the
tag ‘tensor_gradient’.
- Returns
with
requires_gradset to True and gradient summarization hook registered.- Return type
cloned
tensor
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class TransformerBlock(d_model, num_heads, memory_size, d_k=None, d_v=None, d_ff=None, dropout=0.0, activation=<built-in method relu_ of type object>, positional_encoding='abs', add_positional_encoding=True, scale_attention_score=True)[source]#
Bases:
torch.nn.modules.module.ModuleTransformer residue block.
The transformer residue block includes two residue blocks with layer normalization (LN):
Multi-head attention (MHA) block
Position-wise MLP
The overall computation is:
y = x + MHA(LN(x)) z = y + MLP(LN(y))
The original transformer is described in: [1]. Ashish Vaswani et al. Attention Is All You Need
This implementation is a variation which places layer norm at a different location, which is described in: [2]. Ruibin Xiong et al. On Layer Normalization in the Transformer Architecture
We also support the relative positional encoding proposed in [3] Zihang Dai et al. Transformer-XL: Attentive language models beyond a fixed-length context.
In this implementation, the positional encodings are learnable parameter instead of the sinusoidal matrix proposed in [1]
- Parameters
d_model (int) – dimension of the model, same as d_model in [1]
num_heads (int) – the number of attention heads
memory_size (int) – maximal allowed sequence length
d_k (int) – Dimension of key, same as d_k in [1]. If None, use
d_model // num_headsd_v (int) – Dimension of value, same as d_v in [1]. If None, use
d_model // num_headsd_ff (int) – Diemension of the MLP, same as d_ff in [1]. If None, use
4 * d_modeldropout (float) – the dropout ratio. Note the [1] uses 0.1 for dropout.
activation (Callable) – the activiation for the hidden layer of the MLP. relu and gelu are two popular choices.
positional_encoding (str) – One of [‘none’, ‘abs’, ‘rel’]. If ‘none’, no position encoding will be used. If ‘abs’, use absolute positional encoding depending on the absolute position in the memory sequence, same as that described in [1]. If ‘rel’, use the relative positional encoding proposed in [3].
add_positional_encoding (bool) – If True, in addition to use positional encoding for calculating the attention weights, the positional encoding is also concatenated to the attention result so that the attention result can keep the location information better. Note that using this option will increase the number of parameters by about 25%. This option is ignored if
positional_encodingis ‘none’.scale_attention_score (bool) – If True, scale the attention score by
d_k ** -0.5as suggested in [1]. However, this may not always be better since it slows the unittest in layers_test.py
- forward(memory, query=None, mask=None)[source]#
Forward computation.
Notation: B: batch_size, N: length of
memory, M: length ofquery- Parameters
memory (Tensor) – The shape is [B, N, d_model]
query (Tensor) – The shape [B, d_model] or [B, M, d_model]. If None, will use memory as query
mask (Tensor|None) – A tensor for indicating which slot in
memorywill NOT be used. Its shape can be [B, N] or [B, M, N]. If the shape is [B, N], mask[b, n] = True indicates NOT using memory[b, n] for calculating the attention result forquery[b], while mask[b, n] = False means using it. If the shape is [B, M, N], maks[b, m, n] = True indicates NOT to use memory[b, n] for calculating the attention result forquery[b, m], while mask[b, m, n] = False indicates using memory[b, n] to attendquery[b, m].
- Returns
the shape is same as query.
- Return type
Tensor
- training: bool#
- class Transpose(dim0=0, dim1=1)[source]#
Bases:
torch.nn.modules.module.ModuleA layer that perform the transpose of channels.
Note that batch dimension is not considered for transpose. This means that dim0=0 means the dimension after batch dimension.
- Parameters
dim0 (int) – the first dimension to be transposed.
dim1 (int) – the second dimension to be transposed
- forward(x)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- make_parallel(n)[source]#
Create a Transpose layer to handle parallel batch.
It is assumed that a parallel batch has shape [B, n, …] and both the batch dimension and replica dimension are not considered for transpose.
- Parameters
n (int) – the number of replicas.
- Returns
a
Transposelayer to handle parallel batch.
- training: bool#
- make_parallel_input(inputs, n)[source]#
Replicate
inputsover dim 1 forntimes so it can be processed by parallel networks.- Parameters
inputs (nested Tensor) – a nest of Tensor
n (int) –
inputswill be replicatedntimes.
- Returns
inputs replicated over dim 1
- make_parallel_net(module, n)[source]#
Make a parallelized version of
module.A parallel network has
ncopies of network with the same structure but different independently initialized parameters. The parallel network can process a batch of the data with shape [batch_size, n, …] usingnnetworks with same structure.If
modulehas member function make_parallel, it will be called to make the parallel network. Otherwise, it will creates aNaiveParallelLayer, which simply makingncopies ofmoduleand use a loop to call them inforward().Examples:
Applying parallel net on same input:
pnet = make_parallel_net(net, n) # replicate input. # pinput will have shape [batch_size, n, ...], if input has shape [batch_size, ...] pinput = make_parallel_input(input, n) poutput = pnet(pinput)
If you already have parallel input with shape [batch_size, n, …], you can omit the call to
make_parallel_inputin the above code.- Parameters
module (Network | nn.Module | Callable) – the network to be parallelized.
n (int) – the number of copies
- Returns
the parallelized network.
- make_parallel_spec(specs, n)[source]#
Make the spec for parallel network.
- Parameters
specs (nested TensorSpec) – the input spec for the non-parallelized network
n (int) – the number of copies of the parallelized network
- Returns
input tensor spec for the parallelized network
- normalize_along_batch_dims(x, mean, variance, variance_epsilon)[source]#
Normalizes a tensor by
meanandvariance, which are expected to have the same tensor spec with the inner dims ofx.- Parameters
x (Tensor) – a tensor of (
[D1, D2, ..] + shape), whereD1,D2, .. are arbitrary leading batch dims (can be empty).mean (Tensor) – a tensor of
shapevariance (Tensor) – a tensor of
shapevariance_epsilon (float) – A small float number to avoid dividing by 0.
- Returns
Normalized tensor.
alf.module#
Patch torch.nn.Module for better performance.
torch.nn.Module.__getattr__ is frequently used by all class derived from
nn.Module. It can introduce too much unnecessary overhead. So we patch
nn.Module class to remove it.
alf.norm_layers#
- class BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, fixed_weight_norm=False, use_bias=True, track_running_stats=True)[source]#
Bases:
alf.norm_layers._NormBaseBatch Normalization over a 2D or 3D input.
For detail about Batch Normalization, see https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm1d.html
The main difference is that this implementation supports using BN for RNN. The reason is that for RNN, the normalization statics can be dramatically different for different step of RNN. Hence we need to maintain different running statistics for different step of RNN.
The following example shows how to use it, assuming
rnnis aNetworkwhich contains some alf.layers.BatchNorm layers.prepare_rnn_batch_norm(rnn) rnn.set_batch_norm_max_steps(5) for i in range(t): rnn.set_batch_norm_current_step(i) y, state = rnn(input[i], state)
Note that
set_batch_norm_current_step()also accepts Tensor as its argument. In that case, it means that the current step for each sample in a batch.- Parameters
num_features (
int) – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\)eps (
float) – a value added to the denominator for numerical stability. Default: 1e-5momentum (
float) – the value used for the running_mean and running_var computation. Can be set toNonefor cumulative moving average (i.e. simple average). Default: 0.1affine (
bool) – a boolean value that when set toTrue, this module has learnable affine parameters. Default:Truefixed_weight_norm – whether to fix the norm of the affine weight parameter. The norm will be fixed at ``sqrt(num_features).
use_bias (
bool) – whether to use bias. Note that ifaffineis True, this argument is ignored and bias will be used.track_running_stats (
bool) – a boolean value that when set toTrue, this module tracks the running mean and variance, and when set toFalse, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C)\) or \((N, C, L)\)
Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
- training: bool#
- class BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, fixed_weight_norm=False, use_bias=True, track_running_stats=True)[source]#
Bases:
alf.norm_layers._NormBaseApplies Batch Normalization over a 4D input.
For detail about Batch Normalization, see https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html
The main difference is that this implementation supports using BN for RNN. The reason is that for RNN, the normalization statics can be dramatically different for different step of RNN. Hence we need to maintain different running statistics for different step of RNN.
The following example shows how to use it, assuming
rnnis aNetworkwhich contains some alf.layers.BatchNorm layers.prepare_rnn_batch_norm(rnn) # Only need to call once in the lifetime of rnn rnn.set_batch_norm_max_steps(5) # Only need to call once in the lifetime of rnn for i in range(t): rnn.set_batch_norm_current_step(i) y, state = rnn(input[i], state)
- Parameters
num_features (
int) – \(C\) from an expected input of size \((N, C, H, W)\)eps (
float) – a value added to the denominator for numerical stability. Default: 1e-5momentum (
float) – the value used for the running_mean and running_var computation. Can be set toNonefor cumulative moving average (i.e. simple average). Default: 0.1affine (
bool) – a boolean value that when set toTrue, this module has learnable affine parameters. Default:Truefixed_weight_norm – whether to fix the norm of the affine weight parameter. The norm will be fixed at ``sqrt(num_features).
use_bias (
bool) – whether to use bias. Note that ifaffineis True, this argument is ignored and bias will be used.track_running_stats (
bool) – a boolean value that when set toTrue, this module tracks the running mean and variance, and when set toFalse, this module does not track such statistics, and initializes statistics buffersrunning_meanandrunning_varasNone. When these buffers areNone, this module always uses batch statistics. in both training and eval modes. Default:True
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
- training: bool#
- class ParamLayerNorm(n_groups, output_channels, eps=1e-05)[source]#
Bases:
torch.nn.modules.module.ModuleParamLayerNorm, adapted from
torch.nn.modules.LayerNormA general Layer Normalization layer that does not maintain learnable affine parameters (weight and bias), but accepts both from users. If
n_groupsis greater than 1, it performs parallel Layer Normalization operation. :type n_groups:int:param n_groups: number of parallel groups :type output_channels:int:param output_channels: output size for FC layers, output channel sizefor conv layers.
- Parameters
eps (
float) – refer to nn.GroupNorm
- property bias#
Get stored bias tensor or batch of bias tensors.
- property bias_length#
Get the n_element of a single bias tensor.
- forward(inputs, keep_group_dim=True)[source]#
Forward :type inputs:
Tensor:param inputs: refer to_preprocess_inputof subclass for detailed description. :type keep_group_dim:bool:param keep_group_dim: whether to keep group dimension or not.- Returns
- for BatchNorm1d, with shape
[B, n, D]or[B, n*D], for BatchNorm2d, with shape
[B, n, C, H, W]or[B, n*C, H, W].
- for BatchNorm1d, with shape
- Return type
torch.Tensor
- property output_channels#
Get the n_element of a single weight tensor.
- property param_length#
Get total number of parameters for all layers.
- set_parameters(theta, reinitialize=False)[source]#
Distribute parameters to corresponding parameters. :type theta:
Tensor:param theta: with shape[D] (groups=1)or[B, D] (groups=B),where the meaning of the symbols are: -
B: batch size -D: length of parameters, should be self.param_length When the shape of inputs is[D], it will be unsqueezed to[1, D].- Parameters
reinitialize (
bool) – whether to reinitialize parameters of each layer.
- training: bool#
- property weight#
Get stored weight tensor or batch of weight tensors.
- property weight_length#
Get the n_element of a single weight tensor.
- class ParamLayerNorm1d(n_groups, output_channels, eps=1e-05)[source]#
Bases:
alf.norm_layers.ParamLayerNormA general Layer Normalization layer that does not maintain learnable affine parameters (weight and bias), but accepts both from users. If
n_groupsis greater than 1, it performs parallel Layer Normalization operation. :type n_groups:int:param n_groups: number of parallel groups :type output_channels:int:param output_channels: output size for FC layers, output channel sizefor conv layers.
- Parameters
eps (
float) – refer to nn.GroupNorm
- training: bool#
- class ParamLayerNorm2d(n_groups, output_channels, eps=1e-05)[source]#
Bases:
alf.norm_layers.ParamLayerNormA general Layer Normalization layer that does not maintain learnable affine parameters (weight and bias), but accepts both from users. If
n_groupsis greater than 1, it performs parallel Layer Normalization operation. :type n_groups:int:param n_groups: number of parallel groups :type output_channels:int:param output_channels: output size for FC layers, output channel sizefor conv layers.
- Parameters
eps (
float) – refer to nn.GroupNorm
- training: bool#
- prepare_rnn_batch_norm(module)[source]#
Prepare an RNN network
moduleto use alf.layers.BatchNorm layers.It will report error if any nn.BatchNorm layer is found within
module- Return type
bool- Returns
True if alf.layers.BatchNorm layers have been found
False otherwise.
alf.tensor_specs#
TensorSpec with PyTorch types; adapted from Tensorflow’s tensor_spec.py:
https://github.com/tensorflow/tensorflow/blob/r1.8/tensorflow/python/framework/tensor_spec.py
- class BoundedTensorSpec(shape, dtype=torch.float32, minimum=0, maximum=1)[source]#
Bases:
alf.tensor_specs.TensorSpecA TensorSpec that specifies minimum and maximum values. Example usage:
spec = BoundedTensorSpec((1, 2, 3), torch.float32, 0, (5, 5, 5)) torch_minimum = torch.as_tensor(spec.minimum, dtype=spec.dtype) torch_maximum = torch.as_tensor(spec.maximum, dtype=spec.dtype)
Bounds are meant to be inclusive. This is especially important for integer types. The following spec will be satisfied by tensors with values in the set {0, 1, 2}:
spec = BoundedTensorSpec((3, 5), torch.int32, 0, 2)
- Parameters
shape (tuple[int]) – The shape of the tensor.
dtype (str or torch.dtype) – The type of the tensor values, e.g., “int32” or torch.int32
minimum – numpy number or sequence specifying the minimum element bounds (inclusive). Must be broadcastable to shape.
maximum – numpy number or sequence specifying the maximum element bounds (inclusive). Must be broadcastable to shape.
- property maximum#
Returns a NumPy array specifying the maximum bounds (inclusive).
- property minimum#
Returns a NumPy array specifying the minimum bounds (inclusive).
- numpy_sample(outer_dims=None, rng=<module 'numpy.random' from '/home/docs/checkouts/readthedocs.org/user_builds/alf-fork/envs/latest/lib/python3.8/site-packages/numpy/random/__init__.py'>)[source]#
Sample numpy arrays uniformly given the min/max bounds.
- Parameters
outer_dims (list[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.
rng (numpy.random.RandomState) – random number generator
- Returns
an array of
self._dtype- Return type
np.ndarray
- replace(shape=None, dtype=None, minimum=None, maximum=None)[source]#
Create a new BoundedTensorSpec with part of the properties replaced.
For example, if we have a BoundedTensorSpec like
spec = BoundedTensorSpec((3, 5), torch.int32, 0, 2)
You can explicitly create a similar spec with a different shape and minimum by
new_spec = spec.replace(shape=(4, 8), minimum=-1)
- Return type
- class TensorSpec(shape, dtype=torch.float32)[source]#
Bases:
objectDescribes a torch.Tensor.
A TensorSpec allows an API to describe the Tensors that it accepts or returns, before that Tensor exists. This allows dynamic and flexible graph construction and configuration.
- Parameters
shape (tuple[int]) – The shape of the tensor.
dtype (str or torch.dtype) – The type of the tensor values, e.g., “int32” or torch.int32
- constant(value, outer_dims=None)[source]#
Create a constant tensor from the spec.
- Parameters
value – a scalar
outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.
- Returns
a tensor of
self._dtype.- Return type
tensor (torch.Tensor)
- property dtype#
Returns the dtype of elements in the tensor.
- property dtype_str#
The str representation of dtype
It can be used to contruct a numpy array.
- classmethod from_array(array, from_dim=0)[source]#
Create TensorSpec from numpy array.
- Parameters
array (np.ndarray|np.number) – array from which the spec is extracted
from_dim (int) – use
array.shape[from_dim:]as shape
- Returns
TensorSpec
- classmethod from_tensor(tensor, from_dim=0)[source]#
Create TensorSpec from tensor.
- Parameters
tensor (Tensor) – tensor from which the spec is extracted
from_dim (int) – use tensor.shape[from_dim:] as shape
- Returns
TensorSpec
- property is_continuous#
Whether spec is continuous.
- property is_discrete#
Whether spec is discrete.
- property ndim#
Return the rank of the tensor.
- property numel#
Returns the number of elements.
- numpy_constant(value, outer_dims=None)[source]#
Create a constant np.ndarray from the spec.
- Parameters
value (Number) – a scalar
outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.
- Returns
an array of
self._dtype.- Return type
np.ndarray
- numpy_zeros(outer_dims=None)[source]#
Create a zero numpy.ndarray from the spec.
- Parameters
outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.
- Returns
an array of
self._dtype.- Return type
np.ndarray
- ones(outer_dims=None)[source]#
Create an all-one tensor from the spec.
- Parameters
outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.
- Returns
a tensor of
self._dtype.- Return type
tensor (torch.Tensor)
- rand(outer_dims=None)[source]#
Create a tensor filled with random numbers in \([0,1]\).
- Parameters
outer_dims (
Optional[Tuple[int]]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.- Returns
a tensor of
self._dtype.- Return type
torch.Tensor
- randn(outer_dims=None)[source]#
Create a tensor filled with random numbers from a std normal dist.
- Parameters
outer_dims (tuple[int]) – an optional list of integers specifying outer dimensions to add to the spec shape before sampling.
- Returns
a tensor of
self._dtype.- Return type
tensor (torch.Tensor)
- replace(shape=None, dtype=None)[source]#
Create a new TensorSpec with part of the properties replaced.
For example, if we have a TensorSpec like
spec = TensorSpec((3, 5), torch.int32)
You can explicitly create a similar spec with a different dtype by
new_spec = spec.replace(dtype=torch.float32)
- Return type
- property shape#
Returns the TensorShape that represents the shape of the tensor.