alf.utils#

alf.utils.action_quantizer#

Action Quantizer.

class ActionQuantizer(action_spec, sampling_method='uniform', action_bins=7, rep_mode='center')[source]#

Bases: object

Quantize actions in a specified way.

Parameters

action_spec (BoundedTensorSpec) – action spec
sampling_method (str) –
sampling space, uniform or log space：
- ”uniform”: the original space
- ”log”: the logarithm space
action_bins (int) – number of bins used for discretization
rep_mode (str) –
the mode of representation for quantization:
- ”center”: linspace(lb + bin-size/2, ub - bin_size/2, bin_num)
- ”boundary”: linspace(lower_bound, upper_bound, bin_num)

property action_bins#

action_to_ind(action)[source]#

ind_to_action(action_ind)[source]#

alf.utils.action_samplers#

class CategoricalSeedSampler(num_classes, new_noise_prob=0.01, concentration=1)[source]#

Bases: alf.utils.action_samplers._CategoricalSeedSamplerBase

Sample actions with temporal consistency.

In order to do so, we maintain an internal stateful noise vector $\epsilon$ and use it to modify the original categorical distribution $\pi$ to a new distribution $\tilde{\pi}=f(\pi, \epsilon)$. The evolution of $\epsilon$ and $f$ are chosen so that $E(\tilde{\pi})=\pi$. More specifically, $f$ is chosen so that $\tilde{\pi}$ follows Dirichlet distribution $Dir(c \pi)$.

Parameters

num_classes (int) – number of classes for the categorical distribution
new_noise_prob (float) – the probability of generating a new $\epsilon$
concentration (float) – the concentration scaling factor c. Larger concentration tends to generate $\tilde{\pi}$ closer to $\pi$.

Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of

the input.

state_spec (nested TensorSpec): the (nested) tensor spec of the state: of the network.

name (str):

forward(input, state)[source]#

Parameters

input (Tensor) – the parameter of the categorical distribution with the shape of [batch_size, num_classes]
state (Tensor) – noise state (i.e. $\epsilon$)

training: bool#

class EpsilonGreedySampler(epsilon_greedy=0.1)[source]#

Bases: torch.nn.modules.module.Module

Epsilon greedy sampler.

With probability 1 - epsilon_greedy, sample actions with the largest probability. With probability epsilon_greedy, sample actions according to the given categorical distribution.

Parameters: epsilon_greedy – see above.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Parameters: input – categorical probabilities with the shape of [batch_size, num_classes]

training: bool#

class MultinomialSampler[source]#

Bases: torch.nn.modules.module.Module

Sample actions according to the given multinomial distribution.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(input)[source]#

Parameters: input – categorical probabilities with the shape of [batch_size, num_classes]

training: bool#

alf.utils.averager#

Classes for doing moving average.

class AdaptiveAverager(tensor_spec, speed=10.0, name='AdaptiveAverager')[source]#

Bases: alf.utils.averager.EMAverager

Averager with adaptive update_rate.

This averager gives higher weight to more recent samples for calculating the average. Roughly speaking, the weight for each sample at time $t$ is roughly proportional to $(t/T)^{speed-1}$, where $T$ is the current time step. See notes/streaming_averaging_amd_sampling.py for detail.

Parameters

tensor_spec (nested TensorSpec) – the TensorSpec for the value to be averaged
speed (float) – speed of updating mean and variance.
name (str) – name of this averager

training: bool#

update(tensor)[source]#

Update the average.

Parameters: tensor (nested Tensor) – a value for updating the average; outer dims will be first averaged before being added to the average

class EMAverager(tensor_spec, update_rate, name='EMAverager')[source]#

Bases: torch.nn.modules.module.Module

Class for exponential moving average. Suppose the update rate is $\alpha$, and the quantity to be averaged is denoted as $x$, then

\[x_t = (1-\alpha)x_{t-1} + \alpha x\]

The average is corrected by a mass $w_t$ as $\frac{x_t}{w_t}$, and the mass is calculated as:

\[w_t = (1-\alpha) * w_{t-1} + \alpha\]

Note that update rate can be a fixed floating number or a variable. If it is a variable, the update rate can be changed by the user.

Parameters

tensor_spec (nested TensorSpec) – the TensorSpec for the value to be averaged
update_rate (float|Variable) – the update rate
name (str) – name of this averager

average(tensor)[source]#

Combines self.update and self.get in one step. Can be handy in practice.

Parameters: tensor (nested Tensor) – a value for updating the average; outer dims will be first averaged before being added to the average
Returns: the current average
Return type: Tensor

get()[source]#

Get the current average.

Returns: the current average
Return type: Tensor

training: bool#

update(tensor)[source]#

Update the average.

Parameters: tensor (nested Tensor) – value for updating the average; outer dims will be first averaged before being added to the average
Returns: None

class ScalarAdaptiveAverager(speed=10, dtype=torch.float32, name='ScalarAdaptiveAverager')[source]#

Bases: alf.utils.averager.AdaptiveAverager

AdaptiveAverager for scalar value.

Parameters

speed (float) – speed of updating mean and variance.
dtype (torch.dtype) – dtype of the scalar
name (str) – name of this averager

training: bool#

class ScalarEMAverager(update_rate, dtype=torch.float32, name='ScalarEMAverager')[source]#

Bases: alf.utils.averager.EMAverager

EMAverager for scalar value

Parameters

udpate_rate (float|Variable) – update rate
dtype (torch.dtype) – dtype of the scalar
name (str) – name of this averager

training: bool#

class ScalarWindowAverager(window_size, dtype=torch.float32, name='ScalarWindowAverager')[source]#

Bases: alf.utils.averager.WindowAverager

WindowAverager for scalar value

Parameters

window_size (int) – the size of the window
dtype (torch.dtype) – dtype of the scalar
name (str) – name of this averager

training: bool#

class WindowAverager(tensor_spec, window_size, name='WindowAverager')[source]#

Bases: torch.nn.modules.module.Module

WindowAverager calculate the average of the past window_size samples. :type tensor_spec: TensorSpec :param tensor_spec: the TensorSpec for the value to be

averaged

Parameters

window_size (int) – the size of the window
name (str) – name of this averager

average(tensor)[source]#

Combines self.update and self.get in one step. Can be handy in practice.

Parameters: tensor (nested Tensor) – a value for updating the average; outer dims will be averaged first before being added
Returns: the current average
Return type: Tensor

get()[source]#

Get the current average.

Returns: the current average
Return type: Tensor

training: bool#

update(tensor)[source]#

Update the average.

Parameters: tensor (nested Tensor) – value for updating the average; outer dims will be averaged first before being added.
Returns: None

average_outer_dims(tensor, spec)[source]#

Parameters

tensor (Tensor) – a single Tensor
spec (TensorSpec) –

Returns

the average tensor across outer dims

alf.utils.checkpoint_utils#

class Checkpointer(ckpt_dir, **kwargs)[source]#

Bases: object

A checkpoint manager for saving and loading checkpoints.

A class for saving checkpoints. It also saves a json file containing the structure of the model state checkpoint, which facilitates inspecting the structure of the checkpoint without having to load it first. This is useful for cases such as extracting a sub-dictionary from the whole.

Example usage:

alg_root = MyAlg(params=[p1, p2], sub_algs=[a1, a2], optimizer=opt)
ckpt_mngr = ckpt_utils.Checkpointer(ckpt_dir,
                    alg=alg_root)

Parameters

ckpt_dir – The directory to save checkpoints. Create ckpt_dir if it doesn’t exist.
kwargs – Items to be included in the checkpoint. Each item needs to have state_dict and load_state_dict implemented. For instance of Algorithm, only the root need to be passed in, all the children modules and optimizers are automatically extracted and checkpointed. If a child module is also passed in, it will be treated as the root to be recursively processed.

has_checkpoint(global_step='latest')[source]#

Whether there is a checkpoint in the checkpoint directory.

Parameters: global_step (int|str) – If an int, return True if file “ckpt-{global_step}” is in the checkpoint directory. If “lastest”, return True if “latest” is in the checkpoint directory.

load(global_step='latest', ignored_parameter_prefixes=[], including_optimizer=True, including_replay_buffer=True, including_data_transformers=True, strict=True)[source]#

Load checkpoint :param global_step: the number of training steps which is used to

specify the checkpoint to be loaded. If global_step is ‘latest’, the most recent checkpoint named ‘latest’ will be loaded.

Parameters

ingored_parameter_prefixes (list[str]) – ignore the parameters whose name has one of these prefixes in the checkpoint.
including_optimizer (bool) – whether load optimizer checkpoint
including_replay_buffer (bool) – whether load replay buffer checkpoint.
including_data_transformers (bool) – whether load data transformer checkpoint.
strict (bool, optional) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s torch.nn.Module.state_dict function. If strict=True, will keep lists of missing and unexpected keys and raise error when any of the lists is non-empty; if strict=False, missing/unexpected keys will be omitted and no error will be raised. (Default: True)

Returns

the current step number for the loaded: checkpoint. current_step_num is set to - 1 if the specified checkpoint does not exist.

Return type

current_step_num (int)

save(global_step)[source]#

Save states of all modules to checkpoint

Parameters: global_step (int) – the number of training steps corresponding to the current state to be saved. It will be appended to the name of the checkpoint as a suffix. This function will also save a copy of the latest checkpoint in a file named ‘latest’.

enable_checkpoint(module, flag=True)[source]#

Enable/disable checkpoint for module.

Parameters

module (torch.nn.Module) –
flag (bool) – True to enable checkpointing, False to disable.

extract_sub_state_dict_from_checkpoint(checkpoint_prefix, checkpoint_path)[source]#

Extract a (sub-)state-dictionary from a checkpoint file. The state dictionary can be a sub-dictionary specified by the checkpoint_prefix. :param checkpoint_prefix: the prefix to the sub-dictionary in the

checkpoint to be loaded. It can be a multi-step path denoted by “A.B.C” (e.g. “alg._sub_alg1”). If prefix is ‘’, the full dictionary from the checkpoint file will be returned.

Parameters: checkpoint_path (str) – the full path to the checkpoint file saved by ALF, e.g. “/path_to_experiment/train/algorithm/ckpt-100”.

is_checkpoint_enabled(module)[source]#

Whether module will checkpointed.

By default, a module used in Algorithm will be checkpointed. The checkpointing can be disabled by calling enable_checkpoint(module, False) :param module: module in question :type module: torch.nn.Module

Returns: True if the parameters of this module will be checkpointed
Return type: bool

alf.utils.common#

Various functions used by different alf modules.

class Periodically(body, period, name='periodically')[source]#

Bases: torch.nn.modules.module.Module

Periodically performs the operation defined in body.

Parameters

body (Callable) – callable to be performed every time an internal counter is divisible by the period.
period (int) – inverse frequency with which to perform the operation.
name (str) – name of the object.

Raises

TypeError – if body is not a callable.

forward()[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#

class TargetUpdater(models, target_models, tau=1.0, period=1, init_copy=True, delayed_update=False)[source]#

Bases: torch.nn.modules.module.Module

Performs a soft update of the target model parameters.

For each weight $w_s$ in the model, and its corresponding weight $w_t$ in the target_model, a soft update is:

\[w_t = (1 - \tau) * w_t + \tau * w_s.\]

Note: we only perform soft updates for parameters and always copy buffers.

Parameters

models (Network | list[Network] | Parameter | list[Parameter]) – the current model or parameter.
target_models (Network | list[Network] | Parameter | list[Parameter]) – the model or parameter to be updated.
tau (float) – A float scalar in $[0, 1]$. Default $\tau=1.0$ means hard update.
period (int) – Step interval at which the target model is updated.
init_copy (bool) – If True, also copy models to target_models in the beginning.
delayed_update (bool) – if True, target_models is updated using recent_models every period steps. If tau is 1, the recent_models is models period steps before. If tau is not 1, recent_models is an exponential moving average of models with rate tau. The use of delayed_update may help to improve the stability of TD learning when a small period is used.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward()[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

abs_path(path)[source]#: Given any path, return the absolute path with expanding the user.

active_action_target_entropy(active_action_portion=0.2, min_entropy=0.3)[source]#

Automatically compute target entropy given the action spec. Currently support discrete actions only.

The general idea is that we assume $Nk$ actions having uniform probs for a good policy. Thus the target entropy should be $log(Nk)$, where $N$ is the total number of discrete actions and k is the active action portion.

TODO: incorporate this function into EntropyTargetAlgorithm if it proves to be effective.

Parameters

active_action_portion (float) – a number in $(0, 1]$. Ideally, this value should be greater than 1/num_actions. If it’s not, it will be ignored.
min_entropy (float) – the minimum possible entropy. If the auto-computed entropy is smaller than this value, then it will be replaced.

Returns

the target entropy for EntropyTargetAlgorithm.

Return type

float

add_method(cls)[source]#

A decorator for adding a method to a class (cls). Example usage:

class A:
    pass
@add_method(A)
def new_method(self):
    print('new method added')
# now new_method() is added to class A and is ready to be used
a = A()
a.new_method()

alf_root()[source]#: Get the ALF root path.

as_list(x)[source]#

Convert x to a list.

It performs the following conversion:

None => []
list => x
tuple => list(x)
other => [x]

Parameters: x (any) – the object to be converted
Returns
Return type: list

call_stack()[source]#

Return a list of strings showing the current function call stacks for debugging.

Return type: List[str]

cast_transformer(observation, dtype=torch.float32)[source]#

Cast observation

Parameters

observation (nested Tensor) – observation
dtype (Dtype) – The destination type.

Returns

casted observation

check_numerics(nested)[source]#

Assert all the tensors in nested are finite.

Parameters: nested (nested Tensor) – nested Tensor to be checked.

compute_summary_or_eval_interval(config, summary_or_eval_calls=100)[source]#

Automatically compute a summary or eval interval according to the config and the expected total number of summary or eval calls. This function can avoid manually computing the interval value when an expected number of calls is in mind.

Warning

This function might not work for algorithms that change the global counter themselves, e.g., LMAlgorithm.

Parameters

config (TrainerConfig) – the configuration object for training
summary_or_eval_calls (int) – the expected number of summary or eval calls throughout the training process. This number can control the time consumed on summary or eval. Note that this number might not be exactly satisfied eventually, if the calculated interval has been rounded up.

Returns

summary or eval interval

Return type

int

copy_gin_configs(root_dir, gin_files)[source]#

Copy gin config files to root_dir

Parameters

root_dir (str) – directory path
gin_files (None|list[str]) – list of file paths

create_ou_process(action_spec, ou_stddev, ou_damping)[source]#

Create nested zero-mean Ornstein-Uhlenbeck processes.

The temporal update equation is:

x_next = (1 - damping) * x + N(0, std_dev)

Note: if action_spec is nested, the returned nested OUProcess will not bec checkpointed.

Parameters

action_spec (nested BountedTensorSpec) – action spec
ou_damping (float) – Damping rate in the above equation. We must have $0 <= damping <= 1$.
ou_stddev (float) – Standard deviation of the Gaussian component.

Returns

nested OUProcess with the same structure as action_spec.

detach(nests)[source]#

Detach nested Tensors or Distributions

Parameters: nests (Any) – tensors or distributions to be detached
Returns: detached Tensors/Distributions with same structure as nests

class eval_context[source]#

Bases: object

A context manager that will automatically mark the _exe_mode flag as EXE_MODE_EVAL when entering a context and revert to the original _exe_mode when exiting the context.

exe_mode_name()[source]#: return the execution mode as string.

expand_dims_as(x, y, end=True)[source]#

Expand the shape of x with extra singular dimensions.

The result is broadcastable to the shape of y.

Parameters

x (Tensor) – source tensor
y (Tensor) – target tensor. Only its shape will be used.
end (bool) – If True, the extra dimensions are at the end of x; otherwise they are at the beginning.

Returns

x with extra singular dimensions.

flattened_size(spec)[source]#

Return the size of the vector if spec.shape is flattened.

It’s same as np.prod(spec.shape) :param spec: a TensorSpec object :type spec: alf.TensorSpec

Returns: the size of flattened shape
Return type: np.int64

generate_alf_snapshot(alf_root, conf_file, dest_path)[source]#

Given a destination path, copy the local ALF root dir to the path. To save disk space, only *.py files will be copied.

This function can be used to generate a snapshot of the repo so that the exactly same code status will be recovered when later playing a trained model or launching a grid-search job in the waiting queue.

Parameters

alf_root (str) – the parent path of the ‘alf’ module
conf_file (str) – the alf config file
dest_path (str) – the path to generate a snapshot of ALF repo

get_action_spec()[source]#

Get the specs of the tensors expected by step(action) of the global environment.

Returns: a spec that describes the shape and dtype of each tensor expected by step().
Return type: nested TensorSpec

get_alf_snapshot_env_vars(root_dir)[source]#: Given a root_dir, return modified env variable dict so that PYTHONPATH points to the ALF snapshot under this directory.

get_all_parameters(obj)[source]#

Get all the parameters under obj and its descendents.

Note: This function assumes all the parameters can be reached through tuple, list, dict, set, nn.Module or the attributes of an object. If a parameter is held in a strange way, it will not be included by this function.

Parameters: obj (object) – will look for paramters under this object.
Returns: list of (path, Parameters)
Return type: list

get_conf_file(root_dir=None)[source]#

Get the configuration file.

If FLAGS.conf is not set, find alf_config.py or configured.gin under FLAGS.root_dir and returns it. If there is no ‘conf’ flag defined, return None.

Parameters: root_dir (str) – when None, FLAGS.root_dir is used to find the conf file.
Returns: the name of the conf file. None if there is no conf file
Return type: str

get_env()[source]#

get_epsilon_greedy(config)[source]#

get_gin_confg_strs()[source]#

Obtain both the operative and inoperative config strs from gin.

The operative configuration consists of all parameter values used by configurable functions that are actually called during execution of the current program, and inoperative configuration consists of all parameter configured but not used by configurable functions. See gin.operative_config_str() and gin_utils.inoperative_config_str for more detail on how the config is generated.

Returns

md_operative_config_str (str): a markdown-formatted operative str
md_inoperative_config_str (str): a markdown-formatted inoperative str

Return type

tuple

get_gin_file()[source]#

Get the gin configuration file.

If FLAGS.gin_file is not set, find gin files under FLAGS.root_dir and returns them. If there is no ‘gin_file’ flag defined, return ‘’.

Returns: the gin file(s)

get_initial_policy_state(batch_size, policy_state_spec)[source]#

Return zero tensors as the initial policy states.

Parameters

batch_size (int) – number of policy states created
policy_state_spec (nested structure) – each item is a tensor spec for a state

Returns

each item is a tensor with the first dim equal: to batch_size. The remaining dims are consistent with the corresponding state spec of policy_state_spec.

Return type

state (nested structure)

get_initial_time_step(env, first_env_id=0)[source]#

Return the initial time step.

Parameters

env (AlfEnvironment) –
first_env_id (int) – the environment ID for the first sample in this batch.

Returns

the init time step with actions as zero tensors.

Return type

TimeStep

get_observation_spec(field=None)[source]#

Get the spec of observation transformed by data transformers.

The data transformers are specified by TrainerConfig.data_transformer_ctor.

Parameters: field (str) – a multi-step path denoted by “A.B.C”.
Returns: a spec that describes the observation.
Return type: nested TensorSpec

get_raw_observation_spec(field=None)[source]#

Get the TensorSpec of observations provided by the global environment.

Parameters: field (str) – a multi-step path denoted by “A.B.C”.
Returns: a spec that describes the observation.
Return type: nested TensorSpec

get_reward_spec()[source]#

Get the specs of the reward tensors of the global environment. :returns: a spec that describes the shape and dtype of each reward

tensor.

Return type: nested TensorSpec

get_states_shape()[source]#

Get the tensor shape of internal states of the agent provided by the global environment.

Returns: 0 if internal states is not part of observation; otherwise a torch.Size. We don’t raise error so this code can serve to check whether env has states input.

get_unused_port(start, end=65536, n=1)[source]#

Get an unused port in the range [start, end) .

Parameters

start (int) – port range start
end (int) – port range end
n (int) – get n consecutive unused ports

Raises

socket.error – if no unused port is available

get_vocab_size()[source]#

Get the vocabulary size of observations provided by the global environment.

Returns: size of the environment’s/teacher’s vocabulary. Returns 0 if language is not part of observation. We don’t raise error so this code can serve to check whether the env has language input
Return type: int

image_scale_transformer(observation, fields=None, min=- 1.0, max=1.0)[source]#

Scale image to min and max (0->min, 255->max).

Parameters

observation (nested Tensor) – If observation is a nested structure, only namedtuple and dict are supported for now.
fields (list[str]) – the fields to be applied with the transformation. If None, then observation must be a Tensor with dtype uint8. A field str can be a multi-step path denoted by “A.B.C”.
min (float) – normalize minimum to this value
max (float) – normalize maximum to this value

Returns

Transfromed observation

info(msg, *args)[source]#

Generate info message msg % args.

Parameters

msg – str, the message to be logged.
*args – The args to be substitued into the msg.

info_once(msg, *args)[source]#

Generate info message msg % args once.

Parameters

msg – str, the message to be logged.
*args – The args to be substitued into the msg.

is_eval()[source]#: Return a bool value indicating whether the current code belongs to evaluation or playing a learned model.

is_inside_docker_container()[source]#

Return whether the current process is running inside a docker container.

See discussions at https://stackoverflow.com/questions/23513045/how-to-check-if-a-process-is-running-inside-docker-container

is_pretrain()[source]#: Return a bool value indicating whether the current code belongs to pre-train. The code within a function that is decorated by mark_pretrain is flagged as pretrain. A code block that is within a pretrain_context is also flagged as pretrain.

is_replay()[source]#

Return a bool value indicating whether the current code belongs to replaying. Replaying implies off-policy training.

Any code under train_from_replay_buffer() of any algorithm is classified as replaying. This phase starts from experience sampling from the replay buffer, all the way to the parameter update.

is_repo_root(dir, module_name)[source]#: Given a directory, check if it is a valid repo root. Currently the way of checking is to see if there is valid __init__.py under it.

is_rollout()[source]#

Return a bool value indicating whether the current code belongs to unrolling. For on-policy algorithms, unrolling could be treated as part of training as it usually generates training info for calculating the loss.

Any code under unroll() of the root RL algorithm is classified as unrolling. This is the phase of collecting experiences for training.

is_training(alg)[source]#

Return a bool value indicating whether the current code is in a training phase, for either an on-policy or an off-policy algorithm.

A training phase is defined as the rollout phase for an on-policy algorithm, or the replay phase for an off-policy algorithm.

Note

Currently this function returns False for the code under train_from_unroll().

Parameters: alg (Algorithm) – the algorithm to be decided

log_metrics(metrics, prefix='')[source]#: Log metrics through logging. :param metrics: list of metrics to be logged :type metrics: list[alf.metrics.StepMetric] :param prefix: prefix to the log segment :type prefix: str

mark_eval(func)[source]#

A decorator that will automatically mark the _exe_mode flag when entering/exiting a evaluation/test function.

Parameters: func (Callable) – a function

mark_pretrain(func)[source]#

A decorator that will automatically mark the _exe_mode flag when entering/exiting a pretrain function.

Parameters: func (Callable) – a function

mark_replay(func)[source]#

A decorator that will automatically mark the _exe_mode flag when entering/exiting a experience replay function.

Parameters: func (Callable) – a function

mark_rollout(func)[source]#

A decorator that will automatically mark the _exe_mode flag when entering/exiting a rollout function.

Parameters: func (Callable) – a function

parse_conf_file(conf_file)[source]#

Parse config from file.

It also looks for FLAGS.gin_param and FLAGS.conf_param for extra configs.

Note: a global environment will be created (which can be obtained by alf.get_env()) and random seed will be initialized by this function using common.set_random_seed().

Parameters: conf_file (str) – the full path to the config file

class pretrain_context[source]#

Bases: object

A context manager that will automatically mark the _exe_mode flag as EXE_MODE_PRETRAIN when entering a context and revert to the original _exe_mode when exiting the context.

read_conf_file(root_dir)[source]#

Read the content of the conf file.

Parameters: root_dir (str) – alf log directory path
Return type: str
Returns: the content of the conf file as a str. None if conf file is not specified through commandline and cannot be found in root_dir

class replay_context[source]#

Bases: object

A context manager that will automatically mark the _exe_mode flag as EXE_MODE_REPLAY when entering a context and revert to the original _exe_mode when exiting the context.

reset_state_if_necessary(state, initial_state, reset_mask)[source]#

Reset state to initial state according to reset_mask.

Parameters

state (nested Tensor) – the current batched states
initial_state (nested Tensor) – batched intitial states
reset_mask (nested Tensor) – with shape=(batch_size,), dtype=torch.bool

Returns

nested Tensor

class rollout_context[source]#

Bases: object

A context manager that will automatically mark the _exe_mode flag as EXE_MODE_ROLLOUT when entering a context and revert to the original _exe_mode when exiting the context.

run_under_record_context(func, summary_dir, summary_interval, flush_secs, summarize_first_interval=True, summary_max_queue=10)[source]#

Run func under summary record context.

Parameters

func (Callable) – the function to be executed.
summary_dir (str) – directory to store summary. A directory starting with ~/ will be expanded to $HOME/.
summary_interval (int) – how often to generate summary based on the global counter
flush_secs (int) – flush summary to disk every so many seconds
summarize_first_interval (bool) – whether to summarize every step of the first interval (default True). It might be better to turn this off for an easier post-processing of the curve.
summary_max_queue (int) – the largest number of summaries to keep in a queue; will flush once the queue gets bigger than this. Defaults to 10.

set_exe_mode(mode)[source]#

Mark whether the current code belongs to unrolling or training. This flag might be used to change the behavior of some functions accordingly.

Parameters: training (bool) – True for training, False for unrolling
Returns: the old exe mode

set_global_env(env)[source]#: Set global env.

set_random_seed(seed)[source]#

Set a seed for deterministic behaviors.

Note: If someone runs an experiment with a pre-selected manual seed, he can definitely reproduce the results with the same seed; however, if he runs the experiment with seed=None and re-run the experiments using the seed previously returned from this function (e.g. the returned seed might be logged to Tensorboard), and if cudnn is used in the code, then there is no guarantee that the results will be reproduced with the recovered seed.

Parameters: seed (int|None) – seed to be used. If None, a default seed based on pid and time will be used.
Returns: The seed being used if seed is None.

set_transformed_observation_spec(spec)[source]#: Set the spec of the observation transformed by data transformers.

snapshot_repo_roots()[source]#

Return a dict of repo root dirs for snapshot. The paths should be defined by a special environment variable ALF_SNAPSHOT_REPO_ROOTS, in the following format:

export ALF_SNAPSHOT_REPO_ROOTS="<module_name1>=<repo_root1>:<module_name2>=<repo_root2>:..."

where pairs of “<module_name>=<repo_root>” are separated by “:”. Note that <repo_root> should be the parent dir of the module package dir.

Returns

a dict of {module_name: repo_root}, excluding the alf repo: itself.

Return type

dict[str]

summarize_config()[source]#: Write config to TensorBoard.

summarize_gin_config()[source]#: Write the operative and inoperative gin config to Tensorboard summary.

tuplify2d(x)[source]#

Convert x to a tuple of length two.

It performs the following conversion:

x => x if isinstance(x, tuple) and len(x) == 2
x => (x, x) if not isinstance(x, tuple)

Parameters: x (any) – the object to be converted
Returns
Return type: tuple

unzip_alf_snapshot(root_dir)[source]#

Restore an ALF snapshot from a job directory by unzipping the snapshot ‘tar.gz’ files.

Parameters: root_dir (str) – the tensorboard job directory

warning(msg, *args)[source]#

Generate warning message msg % args.

Parameters

msg – str, the message to be logged.
*args – The args to be substitued into the msg.

warning_once(msg, *args)[source]#

Generate warning message msg % args once.

Note that the current implementation resembles that of the log_every_n()` function in logging but reduces the calling stack by one to ensure the multiple warning once messages generated at difference places can be displayed correctly.

Parameters

msg – str, the message to be logged.
*args – The args to be substitued into the msg.

write_config(root_dir)[source]#

Write config to a file under directory root_dir

Configs from FLAGS.conf_param are also recorded.

Parameters: root_dir (str) – directory path

write_gin_configs(root_dir, gin_file)[source]#

Write a gin configration to a file. Because the user can

manually change the gin confs after loading a conf file into the code, or
include a gin file in another gin file while only the latter might be copied to root_dir.

So here we just dump the actual used gin conf string to a file.

Parameters

root_dir (str) – directory path
gin_file (str) – a single file path for storing the gin configs. Only the basename of the path will be used.

alf.utils.conditional_ops#

Conditional operations.

conditional_update(target, cond, func, *args, **kwargs)[source]#

Update target according to cond mask

Compute result as an update of target based on cond. To be specific, result[row] is func(*args[row], **kwargs[row]) if cond[row] is True, otherwise result[row] will be target[row]. Note that target will not be changed.

If you simply want to do some conditional computation without actually returning any results. You can use conditional_update in the following way:

# func needs to return an empty tuple ()
conditional_update((), cond, func, *args, **kwargs)

Parameters

target (nested Tensor) – target to be updated
func (Callable) – a function with arguments (*args, **kwargs) and returning a nest with same structure as target
cond (Tensor) – 1d bool Tensor with shape[0] == target.shape[0]

Returns

nest with the same structure and shape as target.

select_from_mask(data, mask)[source]#

Select the items from data based on mask.

data[i,…] will be selected to form a new tensor if mask[i] is True or non-zero

Parameters

data (nested Tensor) – source tensor
mask (Tensor) – 1D Tensor mask.shape[0] should be same as data.shape[0]

Returns

nested Tensor with the same structure as data

alf.utils.data_buffer#

Classes for storing data for sampling.

class DataBuffer(data_spec, capacity, device='cpu', name='DataBuffer')[source]#

Bases: alf.utils.data_buffer.RingBuffer

A simple circular buffer supporting random sampling. This buffer doesn’t preserve temporality as data from multiple environments will be arbitrarily stored.

Not multiprocessing safe.

Parameters

data_spec (nested TensorSpec) – spec for the data item (without batch dimension) to be stored.
capacity (int) – capacity of the buffer.
device (str) – which device to store the data
name (str) – name of the buffer

add_batch(batch)[source]#

Add a batch of items to the buffer.

Add batch_size items along the length of the underlying RingBuffer, whereas RingBuffer.enqueue only adds data of length 1. Truncates the data if batch_size > capacity.

Parameters: batch (Tensor) – of shape [batch_size] + tensor_spec.shape

property current_pos#

property current_size#

get_all()[source]#

get_batch(batch_size)[source]#

Get batsh_size random samples in the buffer.

Parameters: batch_size (int) – batch size
Returns: Tensor of shape [batch_size] + tensor_spec.shape

get_batch_by_indices(indices)[source]#

Get the samples by indices

index=0 corresponds to the earliest added sample in the DataBuffer.

Parameters: indices (Tensor) – indices of the samples
Returns: Tensor of shape [batch_size] + tensor_spec.shape, where batch_size is indices.shape[0]
Return type: Tensor

is_full()[source]#

training: bool#

class RingBuffer(data_spec, num_environments, max_length=1024, device='cpu', allow_multiprocess=False, name='RingBuffer')[source]#

Bases: torch.nn.modules.module.Module

Batched Ring Buffer.

Multiprocessing safe, optionally via: allow_multiprocess flag, blocking modes to enqueue and dequeue, a stop event to terminate blocked processes, and putting buffer into shared memory.

This is the underlying implementation of ReplayBuffer and Queue.

Different from tf_agents.replay_buffers.tf_uniform_replay_buffer, this buffer allows users to specify the environment id when adding batch. Thus, multiple actors can store experience in the same buffer.

Once stop event is set, all blocking enqueue and dequeue calls that happen afterwards will be skipped, unless the operation already started.

Terminology: we use pos as in _current_pos to refer to the always increasing position of an element in the infinitly long buffer, and idx as the actual index of the element in the underlying store (_buffer). That means idx == pos % _max_length is always true, and one should use _buffer[idx] to retrieve the stored data.

Parameters

data_spec (nested TensorSpec) – spec describing a single item that can be stored in this buffer.
num_environments (int) – number of environments or total batch size.
max_length (int) – The maximum number of items that can be stored for a single environment.
device (str) – A torch device to place the Variables and ops.
allow_multiprocess (bool) – if True, allows multiple processes to write and read the buffer asynchronously.
name (str) – name of the replay buffer.

check_convert_env_ids(env_ids)[source]#

circular(pos)[source]#: Mod pos by _max_length to get the actual index in the _buffer.

clear(env_ids=None)[source]#

Clear the buffer.

Parameters: env_ids (Tensor) – optional list of environment ids to clear

dequeue(env_ids=None, n=1, blocking=False)[source]#

Return earliest n steps and mark them removed in the buffer.

Parameters

env_ids (Tensor) – If None, batch_size must be num_environments. If not None, dequeue from these environments. We assume there is no duplicate ids in env_id. result[i] will be from environment env_ids[i].
n (int) – Number of steps to dequeue.
blocking (bool) – If True, blocks if there is not enough data to dequeue.

Returns

nested Tensors or None when blocking dequeue gets terminated by stop event. The shape of the Tensors is [batch_size, n, ...].

Raises

AssertionError – when not enough data is present, in non-blocking
mode. –

property device#: The device where the data is stored in.

enqueue(batch, env_ids=None, blocking=False)[source]#

Add a batch of items to the buffer.

Note, when blocking == False, it always succeeds, overwriting oldest data if there is no free slot.

Parameters

batch (Tensor) – of shape [batch_size] + tensor_spec.shape
env_ids (Tensor) – If None, batch_size must be num_environments. If not None, its shape should be [batch_size]. We assume there are no duplicate ids in env_id. batch[i] is generated by environment env_ids[i].
blocking (bool) – If True, blocks if there is no free slot to add data. If False, enqueue can overwrite oldest data.

Returns

True on success, False only in blocking mode when queue is stopped.

get_current_position()[source]#

Get the current position for each environment.

Returns: with shape [num_environments].
Return type: Tensor

get_earliest_position(env_ids)[source]#

The earliest position that is still in the replay buffer.

Parameters: env_ids (Tensor) – int64 Tensor of environment ids
Returns: Tensor with the same shape as env_ids, whose each entry is the earliest position that is still in the replay buffer for corresponding environment.

has_data(env_ids, n=1)[source]#

Check n steps of data available for env_ids.

Parameters

env_ids (Tensor) – Assumed not None, properly checked by check_convert_env_ids().
n (int) – Number of time steps to check.

Returns

bool

has_space(env_ids)[source]#

Check free space for one batch of data for env_ids.

Parameters: env_ids (Tensor) – Assumed not None, properly checked by check_convert_env_ids().
Returns: bool

property num_environments#

remove_up_to(n, env_ids=None)[source]#

Mark as removed earliest up to n steps.

Parameters: n (int) – max number of steps to mark removed from buffer.

revive()[source]#

Clears the stop Event so blocking mode will start working again.

Only checked in blocking mode of dequeue and enqueue.

stop()[source]#

Stop waiting processes from being blocked.

Only checked in blocking mode of dequeue and enqueue.

All blocking enqueue and dequeue calls that happen afterwards will be skipped (return None for dequeue or False for enqueue), unless the operation already started.

training: bool#

atomic(func)[source]#

Make class member function atomic by checking class._lock.

Can only be applied on class methods, whose containing class must have _lock set to None or a multiprocessing.Lock object.

Parameters: func (callable) – the function to be wrapped.
Returns: the wrapped function

alf.utils.datagen#

Utilities for supervised learning algorithms

class TestDataSet(input_dim=3, output_dim=1, size=1000, weight=None)[source]#

Bases: Generic[torch.utils.data.dataset.T_co]

get_features()[source]#

get_targets()[source]#

get_classes(target, labels)[source]#

Helper function to subclass a dataloader, i.e. select only given: classes from target dataset.

Parameters

target (torch.utils.data.Dataset) – the dataset that should be filtered.
labels (list[int]) – list of labels to filter on.

Returns

indices of examples with label in: labels.

Return type

label_indices (list[int])

load_cifar10(label_idx=None, train_bs=100, test_bs=100, num_workers=0)[source]#

Loads the CIFAR-10 dataset. :param label_idx: classes to be loaded from the dataset. :type label_idx: list[int] :param train_bs: training batch size. :type train_bs: int :param test_bs: testing batch size. :type test_bs: int :param num_workers: number of processes to allocate for loading data. :type num_workers: int

Returns: training data loader. test_loader (torch.utils.data.DataLoader): test data loader.
Return type: train_loader (torch.utils.data.DataLoader)

load_mnist(label_idx=None, train_bs=100, test_bs=100, num_workers=0)[source]#

Loads the MNIST dataset.

Parameters

label_idx (list[int]) – class indices to load from the dataset.
train_bs (int) – training batch size.
test_bs (int) – testing batch size.
num_workers (int) – number of processes to allocate for loading data.
small_subset (bool) – load a small subset of 50 images for testing.

Returns

training data loader. test_loader (torch.utils.data.DataLoader): test data loader.

Return type

train_loader (torch.utils.data.DataLoader)

load_test(train_bs=50, test_bs=10, num_workers=0)[source]#

load_wikitext103(train_bs, test_bs, max_vocab_size=32768)[source]#

Load WikiText103 data.

Note that all return Tensor are always in cpu.

Parameters

train_bs (int) – training batch size
test_bs (int) – validation/test batch size
max_vocab_size (int) – maximal vocabulary size.

Returns

torch.Tensor: train_data, int64 Tensor of shape [?, tran_bs]
torch.Tensor: val_data, int64 Tensor of shape [?, test_bs]
torch.Tensor: test_data, int64 Tensor of shape [?, test_bs]
torchtext.vocab.Vacob: vocab

Return type

tuple

load_wikitext2(train_bs, test_bs)[source]#

Load WikiText2 data.

Note that all return Tensor are always in cpu.

Parameters

train_bs (int) – training batch size
test_bs (int) – validation/test batch size

Returns

torch.Tensor: train_data, int64 Tensor of shape [?, tran_bs]
torch.Tensor: val_data, int64 Tensor of shape [?, test_bs]
torch.Tensor: test_data, int64 Tensor of shape [?, test_bs]
torchtext.vocab.Vacob: vocab

Return type

tuple

alf.utils.dist_utils#

AbsTransform#: alias of alf.utils.dist_utils.get_invertible.<locals>.NewCls

class AffineTransform(loc, scale, event_dim=0, *, cache_size=1)[source]#

Bases: alf.utils.dist_utils.get_invertible.<locals>.NewCls

Overwrite PyTorch’s AffineTransform to provide a builder to be compatible with DistributionSpec.build_distribution().

get_builder()[source]#

class AffineTransformedDistribution(base_dist, loc, scale)[source]#

Bases: torch.distributions.transformed_distribution.TransformedDistribution

Transform via the pointwise affine mapping $y = \text{loc} + \text{scale} \times x$.

The reason of not using td.TransformedDistribution is that we can implement entropy, mean, variance and stddev for AffineTransforma.

Parameters

loc (Tensor or float) – Location parameter.
scale (Tensor or float) – Scale parameter.

entropy()[source]#

Returns entropy of distribution, batched over batch_shape.

Returns: Tensor of shape batch_shape.

property mean#: Returns the mean of the distribution.

property stddev#: Returns the variance of the distribution.

property variance#: Returns the variance of the distribution.

class Beta(concentration1, concentration0, eps=None, validate_args=None)[source]#

Bases: torch.distributions.beta.Beta

Beta distribution parameterized by concentration1 and concentration0.

Note: we need to wrap td.Beta so that self.concentration1 and self.concentration0 are the actual tensors passed in to construct the distribution. This is important in certain situation. For example, if you want to register a hook to process the gradient to concentration1 and concentration0, td.Beta.concentration0.register_hook() will not work because gradient will not be backpropped to td.Beta.concentration0 since it is sliced from td.Dirichlet.concentration and gradient will only be backpropped to td.Dirichlet.concentration instead of td.Beta.concentration0 or td.Beta.concentration1.

Parameters

concentration1 (float or Tensor) – 1st concentration parameter of the distribution (often referred to as alpha)
concentration0 (float or Tensor) – 2nd concentration parameter of the distribution (often referred to as beta)
eps (float) – a very small value indicating the interval [eps, 1-eps] into which the sampled values will be clipped. This clipping can prevent NaN and Inf values in the gradients. If None, a small value defined by PyTorch will be used.

property concentration0#

property concentration1#

property mode#

rsample(sample_shape=())[source]#: We override the original rsample() in order to clamp the output to avoid NaN and Inf values in the gradients. See Pyro’s rsample() implementation in https://docs.pyro.ai/en/dev/_modules/pyro/distributions/affine_beta.html#AffineBeta.

class DiagMultivariateBeta(concentration1, concentration0)[source]#

Bases: torch.distributions.independent.Independent

Create multivariate independent beta distribution.

Parameters

concentration1 (float or Tensor) – 1st concentration parameter of the distribution (often referred to as alpha)
concentration0 (float or Tensor) – 2nd concentration parameter of the distribution (often referred to as beta)

class DiagMultivariateCauchy(loc, scale)[source]#

Bases: torch.distributions.independent.Independent

Create multivariate cauchy distribution with diagonal scale matrix.

Parameters

loc (Tensor) – median of the distribution. Note that Cauchy doesn’t have a mean (divergent).
scale (Tensor) – also known as “half width”. Should have the same shape as loc.

property loc#

property scale#

class DiagMultivariateNormal(loc, scale)[source]#

Bases: torch.distributions.independent.Independent

Create multivariate normal distribution with diagonal variance.

Parameters

loc (Tensor) – mean of the distribution
scale (Tensor) – standard deviation. Should have same shape as loc.

property stddev#: Returns the standard deviation of the distribution.

class DistributionSpec(builder, input_params_spec)[source]#

Bases: object

Parameters

builder (Callable) – the function which is used to build the distribution. The returned value of builder(input_params) is a Distribution with input parameter as input_params.
input_params_spec (nested TensorSpec) – the spec for the argument of builder.

build_distribution(input_params)[source]#

Build a Distribution using input_params.

Parameters: input_params (nested Tensor) – the parameters for build the distribution. It should match input_params_spec provided as __init__.
Returns
Return type: Distribution

classmethod from_distribution(dist, from_dim=0)[source]#

Create a DistributionSpec from a Distribution. :param dist: the Distribution from which the spec is

extracted.

Parameters: from_dim (int) – only use the dimenions from this. The reason of using from_dim>0 is that [0, from_dim) might be batch dimension in some scenario.
Returns
Return type: DistributionSpec

ExpTransform#: alias of alf.utils.dist_utils.get_invertible.<locals>.NewCls

class OUProcess(initial_value, damping=0.15, stddev=0.2)[source]#

Bases: torch.nn.modules.module.Module

A zero-mean Ornstein-Uhlenbeck process for generating noises.

The Ornstein-Uhlenbeck process is a process that generates temporally correlated noise via a random walk with damping. This process describes the velocity of a particle undergoing brownian motion in the presence of friction. This can be useful for exploration in continuous action environments with momentum.

The temporal update equation is:

x_next = (1 - damping) * x + N(0, std_dev)

Parameters

initial_value (Tensor) – Initial value of the process.
damping (float) – The rate at which the noise trajectory is damped towards the mean. We must have $0 <= damping <= 1$, where a value of 0 gives an undamped random walk and a value of 1 gives uncorrelated Gaussian noise. Hence in most applications a small non-zero value is appropriate.
stddev (float) – Standard deviation of the Gaussian component.

forward()[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

training: bool#

class OneHotCategoricalGumbelSoftmax(hard_sample=True, tau=1.0, **kwargs)[source]#

Bases: torch.distributions.one_hot_categorical.OneHotCategorical

Create a reparameterizable td.OneHotCategorical distribution based on the Gumbel-softmax gradient estimator from

Jang et al., "CATEGORICAL REPARAMETERIZATION WITH GUMBEL-SOFTMAX", 2017.

Parameters

hard_sample (bool) – If False, the rsampled result will be a “soft” vector of Gumbel softmax distribution, which naturally supports gradient backprop. If True, argmax will be applied on top of it and then a straight-through gradient estimator is used.
tau (float) – the Gumbel-softmax temperature for rsample. A higher temperature leads to a more uniform sample.

has_rsample = True#

property mode#

rsample(sample_shape=torch.Size([]))[source]#: Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

class OneHotCategoricalStraightThrough(probs=None, logits=None, validate_args=None)[source]#

Bases: torch.distributions.one_hot_categorical.OneHotCategoricalStraightThrough

Provide an additional property mode with gradient enabled.

property mode#

PowerTransform#: alias of alf.utils.dist_utils.get_invertible.<locals>.NewCls

SigmoidTransform#: alias of alf.utils.dist_utils.get_invertible.<locals>.NewCls

class Softclip(low, high, hinge_softness=1.0, cache_size=1)[source]#

Bases: torch.distributions.transforms.Transform

Transform via the mapping defined in alf.math_ops.softclip(). Unlike SoftclipTF, this transform is symmetric regarding the lower and upper bound when squashing.

Parameters

low (float) – the lower bound
high (float) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from low to high.

bijective = True#

codomain: torch.distributions.constraints.Constraint = Real()#

domain: torch.distributions.constraints.Constraint = Real()#

get_builder()[source]#

log_abs_det_jacobian(x, y)[source]#: Compute log|dy/dx|.

sign = 1#

with_cache(cache_size=1)[source]#

SoftclipTF(low, high, hinge_softness=1.0)[source]#

Create a Softclip transform by composing Softlower, Softupper, and Affine transforms, adapted from tensorflow. Mathematically,

clipped = softupper(softlower(x, low), high)
softclip(x) = (clipped - high) / (high - softupper(low, high)) * (high - low) + high

The second scaling step is beause we will have softupper(low, high) < low due to distortion of softplus, so we need to shrink the interval slightly by (high - low) / (high - softupper(low, high)) to preserve the lower bound. Due to this rescaling, the bijector can be mildly asymmetric.

Parameters

low (float|Tensor) – the lower bound
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from low to high.

Softlower(low, hinge_softness=1.0)[source]#

Create a Softlower transform by composing the Softplus and Affine transforms. Mathematically, softlower(x, low) = softplus(x - low) + low.

Parameters

low (float|Tensor) – the lower bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from low to identity.

SoftmaxTransform#: alias of alf.utils.dist_utils.get_invertible.<locals>.NewCls

class Softplus(hinge_softness=1.0, cache_size=1)[source]#

Bases: torch.distributions.transforms.Transform

Transform via the mapping $\text{Softplus}(x) = \log(1 + \exp(x))$.

Code adapted from pyro and tensorflow.

Parameters: hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from 0 to identity.

bijective = True#

codomain: torch.distributions.constraints.Constraint = GreaterThan(lower_bound=0.0)#

domain: torch.distributions.constraints.Constraint = Real()#

log_abs_det_jacobian(x, y)[source]#: Computes the log det jacobian log |dy/dx| given input and output.

sign = 1#

with_cache(cache_size=1)[source]#

class Softsign(cache_size=1)[source]#

Bases: torch.distributions.transforms.Transform

bijective = True#

codomain: torch.distributions.constraints.Constraint = Interval(lower_bound=-1.0, upper_bound=1.0)#

domain: torch.distributions.constraints.Constraint = Real()#

log_abs_det_jacobian(x, y)[source]#: \[\begin{split}\begin{array}{lll} y = \frac{x}{1+x} \rightarrow \frac{dy}{dx} = \frac{1}{(1+x)^2}, &\text{if} &x > 0\\ y = \frac{x}{1-x} \rightarrow \frac{dy}{dx} = \frac{1}{(1-x)^2}, &\text{else}&\\ \end{array}\end{split}\]

sign = 1#

with_cache(cache_size=1)[source]#

Softupper(high, hinge_softness=1.0)[source]#

Create a Softupper transform by composing the Softplus and Affine transforms. Mathematically, softupper(x, high) = -softplus(high - x) + high.

Parameters

high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from identity to high.

class StableCauchy(loc, scale, validate_args=None)[source]#

Bases: torch.distributions.cauchy.Cauchy

rsample(sample_shape=torch.Size([]), clipping_value=0.49)[source]#

Overwrite Pytorch’s Cauchy rsample for a more stable result. Basically the sampled number is clipped to fall within a reasonable range.

For reference:

> np.tan(math.pi * -0.499)
-318.30883898554157
> np.tan(math.pi * -0.49)
-31.820515953773853

Parameters: clipping_value (float) – suppose eps is sampled from (-0.5,0.5). It will be clipped to [-clipping_value, clipping_value] to avoid values with huge magnitudes.

class StableTanh(cache_size=1)[source]#

Bases: torch.distributions.transforms.Transform

Invertible transformation (bijector) that computes $Y = tanh(X)$, therefore $Y \in (-1, 1)$.

This can be achieved by an affine transform of the Sigmoid transformation, i.e., it is equivalent to applying a list of transformations sequentially:

transforms = [AffineTransform(loc=0, scale=2)
              SigmoidTransform(),
              AffineTransform(
                    loc=-1,
                    scale=2]

However, using the StableTanh transformation directly is more numerically stable.

bijective = True#

codomain: torch.distributions.constraints.Constraint = Interval(lower_bound=-1.0, upper_bound=1.0)#

domain: torch.distributions.constraints.Constraint = Real()#

log_abs_det_jacobian(x, y)[source]#: Computes the log det jacobian log |dy/dx| given input and output.

sign = 1#

with_cache(cache_size=1)[source]#

calc_default_max_entropy(spec, fraction=0.8)[source]#

Calc default max entropy. :param spec: action spec :type spec: TensorSpec :param fraction: this fraction of the theoretical entropy upper bound

will be used as the max entropy

Returns: A default max entropy for adjusting the entropy weight

calc_default_target_entropy(spec, min_prob=0.1)[source]#

Calculate default target entropy.

Parameters

spec (TensorSpec) – action spec
min_prob (float) – If continuous spec, we suppose the prob concentrates on a delta of min_prob * (M-m); if discrete spec, we uniformly distribute min_prob on all entries except the peak which has a probability of 1 - min_prob.

Returns

target entropy

calc_default_target_entropy_quantized(spec, num_bins, ent_per_action_dim=- 1.0)[source]#

Calc default target entropy for quantized continuous action. :param spec: action spec :type spec: TensorSpec :param num_bins: number of quantization bins used to represent the

continuous action

Parameters: ent_per_action_dim (int) – desired entropy per action dimension for the non-quantized continuous action; default value is -1.0 as suggested by the SAC paper.
Returns: target entropy for quantized representation

calc_uniform_log_prob(spec)[source]#

Given an action spec, calculate the uniform log prob.

Parameters: spec (BoundedTensorSpec) – action spec must be a bounded spec
Returns: The uniform log probability

compute_entropy(distributions)[source]#

Computes total entropy of nested distribution. :param distributions: A possibly batched tuple of

distributions.

Returns: entropy

compute_log_probability(distributions, actions)[source]#

Computes log probability of actions given distribution.

Parameters

distributions – A possibly batched tuple of distributions.
actions – A possibly batched action tuple.

Returns

the log probability summed over actions in the batch.

Return type

Tensor

distributions_to_params(nests)[source]#

Convert distributions to its parameters, and keep tensors unchanged. Only returns parameters that have Tensor values.

Parameters: nests (nested Distribution and Tensor) – Each Distribution will be converted to dictionary of its Tensor parameters.
Returns: Each leaf is a Tensor or a dict corresponding to one distribution, with keys as parameter name and values as tensors containing parameter values.
Return type: nested Tensor/Distribution

entropy_with_fallback(distributions, return_sum=True)[source]#

Computes total entropy of nested distribution. If entropy() of a distribution is not implemented, this function will fallback to use sampling to calculate the entropy. It returns two values: (entropy, entropy_for_gradient).

There are two situations:

entropy() is implemented and it’s same as entropy_for_gradient.
entropy() is not implemented. We use sampling to calculate entropy. The unbiased estimator for entropy is $-\log(p(x))$. However, the gradient of $-\log(p(x))$ is not an unbiased estimator of the gradient of entropy. So we also calculate a value whose gradient is an unbiased estimator of the gradient of entropy. See estimated_entropy() for detail.

Examples:

ent, ent_for_grad = entropy_with_fall_back(dist, action_spec)
alf.summary.scalar("entropy", ent)
ent_for_grad.backward()

Parameters

distributions (nested Distribution) – A possibly batched tuple of distributions.
return_sum (bool) – if True, return the total entropy. If not True, return the entropy for each distribution in the nest.

Returns

entropy
entropy_for_gradient: You should use entropy in situations where its value is needed, and entropy_for_gradient where you need to calculate the gradient of entropy.

Return type

tuple

epsilon_greedy_sample(nested_distributions, eps=0.1)[source]#

Generate greedy sample that maximizes the probability.

Parameters

nested_distributions (nested Distribution) – distribution to sample from
eps (float) – a floating value in $[0,1]$, representing the chance of action sampling instead of taking argmax. This can help prevent a dead loop in some deterministic environment like Breakout.

Returns

Return type

(nested) Tensor

estimated_entropy(dist, num_samples=1, check_numerics=False)[source]#

Estimate entropy by sampling.

Use sampling to calculate entropy. The unbiased estimator for entropy is $-\log(p(x))$ where $x$ is an unbiased sample of $p$. However, the gradient of $-\log(p(x))$ is not an unbiased estimator of the gradient of entropy. So we also calculate a value whose gradient is an unbiased estimator of the gradient of entropy. See notes/subtleties_of_estimating_entropy.py for detail.

Parameters

dist (torch.distributions.Distribution) – concerned distribution
num_samples (int) – number of random samples used for estimating entropy.
check_numerics (bool) – If true, find NaN / Inf values. For debugging only.

Returns

entropy
entropy_for_gradient: for calculating gradient.

Return type

tuple

extract_distribution_parameters(dist)[source]#

Extract the input parameters of a distribution.

Parameters: dist (Distribution) – distribution from which to extract parameters
Returns: the nest of the input parameter of the distribution

extract_spec(nests, from_dim=1)[source]#

Extract TensorSpec or DistributionSpec for each element of a nested structure. It assumes that the first dimension of each element is the batch size.

Parameters

nests (nested structure) – each leaf node of the nested structure is a Tensor or Distribution of the same batch size.
from_dim (int) – ignore dimension before this when constructing the spec.

Returns

each leaf node of the returned nested spec is the corresponding spec (excluding batch size) of the element of nest.

Return type

nest

get_base_dist(dist)[source]#

Get the base distribution.

Parameters

dist (td.Distribution) –

Returns

The base distribution if dist is td.Independent or: td.TransformedDistribution, and dist if it is td.Normal.

Raises

NotImplementedError – if dist or its based distribution is not td.Normal, td.Independent or td.TransformedDistribution.

get_invertible(cls)[source]#: A helper function to turn on the cache mechanism for transformation. This is useful as some transformations (say $g$) may not be able to provide an accurate inversion therefore the difference between $x$ and $g^{-1}(g(x))$ is large. This could lead to unstable training in practice. For a torch transformation $y=g(x)$, when cache_size is set to one, the latest value for $(x, y)$ is cached and will be used later for future computations. E.g. for inversion, a call to $g^{-1}(y)$ will return $x$, solving the inversion error issue mentioned above. Note that in the case of having a chain of transformations ($G$), all the element transformations need to turn on the cache to ensure the composite transformation $G$ satisfy: $x=G^{-1}(G(x))$.

get_mode(dist)[source]#

Get the mode of the distribution. Note that if dist is a transformed distribution, the result may not be the actual mode of dist.

Parameters: dist (td.Distribution) –
Returns: The mode of the distribution. If dist is a transformed distribution, the result is calculated by transforming the mode of its base distribution and may not be the actual mode for dist.
Raises: NotImplementedError – if dist or its base distribution is not td.Categorical, td.Normal, td.Independent or td.TransformedDistribution.

get_rmode(dist)[source]#

Get the mode of the distribution that support backpropogation. Note that if dist is a transformed distribution, the result may not be the actual mode of dist.

Parameters: dist (td.Distribution) –
Returns: The mode of the distribution. If dist is a transformed distribution, the result is calculated by transforming the mode of its base distribution and may not be the actual mode for dist.
Raises: NotImplementedError – if dist or its base distribution is not td.Normal, StableCauchy, Beta, TruncatedDistribution, td.Independent or td.TransformedDistribution.

params_to_distributions(nests, nest_spec)[source]#

Convert distribution parameters to Distribution, keep tensors unchanged. :param nests: a nested Tensor and dictionary of tensor

parameters of Distribution. Typically, nest is obtained using distributions_to_params().

Parameters: nest_spec (nested DistributionSpec and TensorSpec) – The distribution params will be converted to Distribution according to the corresponding DistributionSpec in nest_spec.
Returns
Return type: nested Distribution or Tensor

rsample_action_distribution(nested_distributions, return_log_prob=False)[source]#

Sample actions from distributions with reparameterization-based sampling.

It uses Distribution.rsample() to do the sampling to enable backpropagation.

Parameters

nested_distributions (nested Distribution) – action distributions.
return_log_prob (bool) – whether to compute and return the log probability of the sampled actions, in addition to the sampled actions. In some cases, it is useful to compute the log probability immediately after the actions are sampled, as some subsequent operations might makes the cache mechanism (if turned on) invalid. Some example scenarios include 1) additional sampling operation applied on nested_distributions, 2) some operations applied to the actions sampled from nested_distributions (e.g., cloning). This which could cause numerical issues if we want to compute the log probability for actions sampled at an early stage, especially for actions that are close to action bounds. For more details on PyTorch Transform, its cache mechanism, and its impacts on RL algorithms, please check https://alf.readthedocs.io/en/latest/notes/pytorch_notes.html#transform-bijector.

Returns

rsampled actions if return_log_prob is False
rsampled actions and log_prob if return_log_prob is True

sample_action_distribution(nested_distributions, return_log_prob=False)[source]#

Sample actions from distributions with conventional sampling without: enabling backpropagation.

Parameters

nested_distributions (nested Distribution) – action distributions.
return_log_prob (bool) –
whether to compute and return the log probability of the sampled actions, in addition to the sampled actions. In some cases, it is useful to compute the log probability immediately after the actions are sampled, as some subsequent operations might makes the cache mechanism (if turned on) invalid. Some example scenarios include 1) additional sampling operation applied on nested_distributions, 2) some operations applied to the actions sampled from nested_distributions (e.g., cloning). This which could cause numerical issues if we want to compute the log probability for actions sampled at an early stage, especially for actions that are close to action bounds. For more details on PyTorch Transform, its cache mechanism, and its impacts on RL algorithms, please check https://alf.readthedocs.io/en/latest/notes/pytorch_notes.html#transform-bijector.

Returns

sampled actions if return_log_prob is False
sampled actions and log_prob if return_log_prob is True

to_distribution_param_spec(nests)[source]#

Convert the DistributionSpecs in nests to their parameter specs.

Parameters: nests (nested DistributionSpec of TensorSpec) – Each DistributionSpec will be converted to a dictionary of the spec of its input Tensor parameters.
Returns: Each leaf is a TensorSpec or a dict corresponding to one distribution, with keys as parameter name and values as TensorSpecs for the parameters.
Return type: nested TensorSpec

alf.utils.distributed#

data_distributed(method)[source]#

This decorator makes a target method of a module capable of being data distributed via DDP.

This is to provide a simple and transparent way to enable DDP for specific code logics.

When the method is wrapped by @data_distributed, the outputs (tensors) of this method will have gradient synchronization hooks attached to them. Later when those outputs are used in backward() to compute gradients, the hooks will be called to synchronize across all processes. As a result, the corresponding parameters receive not only the gradients from this process, but also gradients from the other processes. Note that each single process will be TRAPPED at the call to the backward() that involves those output tensors, until all processes finished the back propagation and have the gradients sync’ed.

Example usage:

class A(nn.Module):
    # ...
    @data_distributed
    def compute_something(self, input):
      return self._network1(input), self._network2(input)
    # ...

In the above code, after applying the decorator, the method compute_something will be made data distributed if the following conditions are met:

Multiple processes within the same process group creates A’s instances and calls compute_something() individually.
All such A instances have self._ddp_activated_rank set to the correct rank of the GPU device that belongs to them.

Otherwise the method compute_something() will behave normally.

data_distributed_when(cond=None)[source]#

This is @ data_distributed with an extra conditionon.

The condition is a function that returns True or False given the wrapped module as the input. If the condition evaluates to False, DDP will not be activated and the original method will be called.

make_ddp_performer(module, method, ddp_rank, find_unused_parameters=False)[source]#

Creates a DDP wrapped MethodPerformer.

This function is an alf.configurable and used in the @data_distributed series of decorators below. Override this in your configuration with

alf.config(‘make_ddp_performer’, find_unused_parameters=True)

to enable find_unused_parameters. This asks DDP to ignore parameters that are not used for computing the output of forward() when waiting for synchronization of gradients and parameters upon backward(). Normally you do not need to worry about this. It is useful for algorithms such as PPG where part of the parameters of the model does NOT ALWAYS contribute to the network output.

alf.utils.distributions#

class CauchyITS[source]#

Bases: alf.utils.distributions.InverseTransformSampling

Cauchy distribution.

\[p(x) = 1 / (pi * (1 + x*x))\]

static cdf(x)[source]#: Cumulative distribution function of this distribution.

static icdf(x)[source]#: Inverse of the CDF

static log_prob(x)[source]#: Log probability density.

class InverseTransformSampling[source]#

Bases: object

Interface for defining inverse transform sampling.

static cdf(x)[source]#: Cumulative distribution function of this distribution.

static icdf(x)[source]#: Inverse of the CDF

static log_prob(x)[source]#: Log probability density.

class NormalITS[source]#

Bases: alf.utils.distributions.InverseTransformSampling

Normal distribution.

\[p(x) = 1/sqrt(2*pi) * exp(-x^2/2)\]

static cdf(x)[source]#: Cumulative distribution function of this distribution.

static icdf(x)[source]#: Inverse of the CDF

static log_prob(x)[source]#: Log probability density.

class T2Cdf_[source]#

Bases: torch.autograd.function.Function

static backward(ctx, grad_output)[source]#

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs did forward() return, and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, x)[source]#

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store tensors that can be then retrieved during the backward pass.

class T2ITS[source]#

Bases: alf.utils.distributions.InverseTransformSampling

Student’s t-distribution with DOF 2.

\[p(x) = 1 / (2 * (1 + x*x) ** 1.5)\]

static cdf(x)[source]#: Cumulative distribution function of this distribution.

static icdf(x)[source]#: Inverse of the CDF

static log_prob(x)[source]#: Log probability density.

class TruncatedCauchy(loc, scale, lower_bound, upper_bound)[source]#

Bases: alf.utils.distributions.TruncatedDistribution

Truncated Cauchy distribution.

The truncated normal distribution $q(x)$ is defined by 4 parameters: location $\mu$, scale parameters $s$, lower bound $l$ and upper bound $u$.

\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s})\]

where $p$ and $P$ are the pdf and cdf of the standard Cauchy distribution respectively.

Parameters

loc – the location parameter
scale – the scale parameter
lower_bound – the lower bound
upper_bound – the upper bound
its – the standard distribution to be used.

class TruncatedDistribution(loc, scale, lower_bound, upper_bound, its)[source]#

Bases: torch.distributions.distribution.Distribution

The base class of truncated distributions.

A truncated distribution $q(x)$ is defined as a standard base distribution $p(x)$ and location $\mu$, scale parameters $s$, lower bound $l$ and upper bound $u$

\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s}) if l \le x le u q(x) = 0 otherwise\]

where $P$ is the cdf of $p$.

Parameters

loc (Tensor) – the location parameter. Its shape is batch_shape + event_shape.
scale (Tensor) – the scale parameter. Its shape is batch_shape + event_shape.
lower_bound (Tensor) – the lower bound. Its shape is event_shape.
upper_bound (Tensor) – the upper bound. Its shape is event_shape.
its (InverseTransformSampling) – the standard distribution to be used.

arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}#

has_rsample = True#

property loc#: Location parameter of this distribution.

log_prob(value)[source]#

The log of the probability density evaluated at value.

Parameters: value (Tensor) – its shape should be sample_shape + batch_shape + event_shape
Returns: Tensor of shape sample_shape + batch_shape

property lower_bound#: Lower bound of this distribution.

property mode#: Mode of this distribution.

rsample(sample_shape=torch.Size([]))[source]#

Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.

Parameters: sample_shape (Size) – sample shape
Returns: Tensor of shape sample_shape + batch_shape + event_shape

property scale#: Scale parameter of this distribution.

property upper_bound#: Upper bound of this distribution.

class TruncatedNormal(loc, scale, lower_bound, upper_bound)[source]#

Bases: alf.utils.distributions.TruncatedDistribution

Truncated normal distribution.

The truncated normal distribution $q(x)$ is defined by 4 parameters: location $\mu$, scale parameters $s$, lower bound $l$ and upper bound $u$.

\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s})\]

where $p$ and $P$ are the pdf and cdf of the standard normal distribution respectively.

Parameters

loc – the location parameter
scale – the scale parameter
lower_bound – the lower bound
upper_bound – the upper bound
its – the standard distribution to be used.

class TruncatedT2(loc, scale, lower_bound, upper_bound)[source]#

Bases: alf.utils.distributions.TruncatedDistribution

Truncated Student’s t distribution with degree of freedom 2.

The truncated normal distribution $q(x)$ is defined by 4 parameters: location $\mu$, scale parameters $s$, lower bound $l$ and upper bound $u$.

\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s})\]

where $p(x)=1 / (2 * (1 + x^2)^1.5)$ and $P$ is the cdf of $p(x)$.

Parameters

loc – the location parameter
scale – the scale parameter
lower_bound – the lower bound
upper_bound – the upper bound
its – the standard distribution to be used.

t2cdf()#

alf.utils.external_configurables#

Make various external gin-configurable objects.

alf.utils.gin_utils#

inoperative_config_str(max_line_length=80, continuation_indent=4)[source]#

Retrieve the “inoperative” configuration as a config string.

Parameters

max_line_length (int) – A (soft) constraint on the maximum length of a line in the formatted string.
continuation_indent (int) – The indentation for continued lines.

Returns

A config string capturing all parameter values configured but not: used by the current program (override by explicit call).

alf.utils.git_utils#

Git utilities.

get_diff(module_root)[source]#

Get the diff of ALF at HEAD.

If the repo is clean, the returned value is an empty string.

Parameters: module_root (str) – the path to the module root
Returns: current diff.

get_revision(module_root)[source]#

Get the current revision of a python module at HEAD.

Parameters: module_root (str) – the path to the module root

alf.utils.lean_function#

lean_function(func)[source]#

Wrap func to save memory for backward.

The returned function performs same computation as func, but save memory by discarding intermediate results. It calculates the gradient by recomputing func using the same input during backward.

Note: There are several requirements for func:

All the Tensor inputs to func must be explicitly listed as arguments

of func. For example, a tuple of Tensors as argument is not allowed. Using Tensors outside of func (e.g., tensors from class member variables) is not allowed either unless func is a nn.Module. On the other hand, if func is a module, its parameters should not be put as arguments as they are automatically taken care of.

If func is not a Network, its return value must be a Tensor

or a tuple of Tensors. If it is a Network, its return value (output and state) must be a nest of Tensors.

func`` must be deterministic so that repeated evaluation with the

same input will get same output.

It is the responsibility of the user of this function to make sure that
func satifisies these requirements. lean_function will not report error if func does not satisfies these requirements and error will be silently ignored.

Note: pytorch also has a function with similar functionality. See https://pytorch.org/docs/stable/checkpoint.html for detail. lean_function has several advantage over pytorch’s implementation:

Keyword arguments are supported.

Both torch.autograd.grad and torch.autograd.backward are supported.

Examples:

Apply to simple function:

def myfunc(x, w, b, scale=1.0):
    return torch.sigmoid(scale * (x @ w) + b)

lean_myfunc = lean_function(myfunc)

y = lean_myfunc(x, w, b)

Apply to nn.Module:

module = alf.layers.FC(3, 5, activation=torch.relu_)
lean_func = lean_function(module)
y = lean_func(x)

Apply to a network

net = alf.nn.Sequential(
    alf.layers.FC(3, 5, activation=torch.relu_),
    alf.layers.FC(5, 1, activation=torch.sigmoid))
lean_func = lean_function(net)
y = lean_func(x)

Parameters: func (Callable) – function or module to be wrapped.
Return type: Callable
Returns: the wrapped function or module. In the case of func being a nn.Module, all the original attributes and methods can still be accessed in the same way through the wrapped module.

alf.utils.losses#

Various function/classes related to loss computation.

class AsymmetricSimSiamLoss(proj_net=None, pred_net=None, input_size=None, proj_hidden_size=256, pred_hidden_size=128, output_size=256, proj_last_use_bn=False, eps=1e-05, fixed_weight_norm=False, lr=None, debug_summaries=True, name='SimSiamLoss')[source]#

Bases: torch.nn.modules.module.Module

The siamese loss proposed in:

Chen Xinlei et. al. “Exploring Simple Siamese Representation Learning” CVPR 2021

The loss is 1-cosine(pred(proj(x), detach(proj(y))), where x is the predicted representation, y is the target representation, and pred and proj are computed using proj_net and pred_net.

Parameters

proj_net (Optional[Network]) – if not provided, a default MLP with two hidden layers and output size as output_size will be created.
pred_net (Optional[Network]) – if not provided, a default MLP with one hidden layer will be created.
input_size (Optional[int]) – input size of proj_net
proj_hidden_size (int) – the size of the hidden layers of proj_net. Only useful if proj_net is not provided.
pred_hidden_size (int) – the size of the hidden layer of pred_net. Only useful if pred_net is not provided.
proj_last_use_bn (bool) – whether to use batch norm for the output layer of proj_net. Only useful if proj_net is not provided
eps (float) – the eps for calling F.normalize() when calculating the normalized vector in order to calculate cosine.
fixed_weight_norm (bool) – whether to fix the norm of the weight parameter of the FC layers.
lr (Optional[float]) – learning rate. If None, the default learning rate will be used.
debug_summaries (bool) – whether to write debug summaries
name (str) – name of this loss

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(pred, target)[source]#

Calculate the loss.

Parameters

pred (Tensor) – predicted representation of shape [B, T, …]
target (Tensor) – target representation of shape [B, T, …]

Return type

Tensor

Returns

loss of shape [B, T]

training: bool#

class BipartiteMatchingLoss(reduction='mean', name='BipartiteMatchingLoss')[source]#

Bases: object

Bipartite matching loss.

This order-invariant loss can be used to evaluate the matching between a predicted set and a target set. The idea is that for every forward, an optimal one-to-one mapping assignment from the predicted set to the target set is first found using some efficient bipartite graph matching algorithm, and the optimal loss is minimized.

Mathematically, suppose there are $N$ objects in either set, $L(x,y)$ is the matching loss between any $(x,y)$ object pair, and $\mathcal{G}_N$ is the permuation space. The forward loss to be minimized is:

\[\min_{g\in\mathcal{G}_N}\sum_n^N L(x_n(\theta),y_{g(n)})\]

where $\theta$ is the model parameters.

In practice, to find the optimal assignment, we simply use scipy.optimize.linear_sum_assignment.

References::

End-to-End Object Detection with Transformers, Carion et al.

https://github.com/facebookresearch/detr/blob/main/models/matcher.py

Parameters: reduction (str) – ‘sum’, ‘mean’ or ‘none’. This is how to reduce the matching loss. For the former two, the loss shape is [B], while for the ‘none’, the loss shape is [B,N].

forward(matching_cost_mat, cost_mat=None)[source]#

Compute the optimal matching loss.

Parameters

matching_cost_mat (Tensor) – the cost matrix used to determine the optimal matching. It shape should be [B,N,N].
cost_mat (Optional[Tensor]) – the cost matrix used to compute the optimal loss once the optimal matching is found. According to the DETR paper, this cost matrix might be different from the one used for matching. If None, then it will be the same matrix for matching.

Returns

the optimal loss. If reduction is ‘mean’ or ‘sum’, its shape is [B], otherwise its shape is [B,N].
the optimal matching given the cost matrix. Its shape is [B,N], where the value of n-th entry is its mapped index in the target set.

Return type

tuple

class DiscreteRegressionLoss(transform=None, inverse_after_mean=False)[source]#

Bases: alf.utils.losses._DiscreteRegressionLossBase

A loss for predicting the distribution of a scalar.

The target is assumed to be in the range [-(n-1)//2, n//2], where n=logits.shape[-1]. The logits are used to calculate the probabilities of being one of the n values. If a target value y is not an integer, it is treated as having prabability mass of $y- \lfloor y \rfloor$ at $\lfloor y \rfloor + 1$ and probability mass of $1 + \lfloor y \rfloor - y$ at $\lfloor y \rfloor$. Then cross entropy loss is applied.

More specifically, the logits passed to calc_loss represents the following: P = softmax(logits) and P[i] means the probability that the (transformed) target is equal to i - (n-1)//2

Note: DescreteRegressionLoss(SqrtLinearTransform(0.001), inverse_after_mean=True): is the loss used by MuZero paper.

Parameters

transform (Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.
inverse_after_mean – when calculating the expected prediction, whether to do the inverse transformation after calculating the the expectation in the transformed space. Note that using inverse_after_mean=True will make the expectation biased in general. This is because $f^{-1}(E(x)) \le E(f^{-1}(x))$ (Jensen inequality) if $f^{-1}$ is convex. In our case, $f^{-1}$ is convex for $x \ge 0$.

calc_expectation(logits)[source]#

Calculate the expected predition in the untransfomred domain from pred.

Parameters: pred – raw model prediction

initialize_bias(bias)[source]#

Initialize the bias of the last FC layer for the prediction properly.

This function set the bias so that the initial distribution of the prediction in the original domain of target is approximatedly Cauchy: $p(x) \propto \frac{1}{1+x^2}$

Parameters: bias (Tensor) – the bias parameter to be initialized.

class MeanSquaredLoss(batch_dims=1, debug_summaries=True, name='MSELoss')[source]#

Bases: object

Mean squared loss.

For a prediction and target pair (x,y), the loss is ((x - y) ** 2).mean().

Parameters: batch_dims (int) – the first so many dims of prediction and target are treated as batch dimension. The mean is performed on the rest of the dimensions.

forward(pred, target)[source]#

Calculate the loss.

Parameters

pred (Tensor) – prediction of shape [B, …]
target (Tensor) – target of shape [B, …]

Return type

Tensor

Returns

loss of shape [B]

class OrderedDiscreteRegressionLoss(transform=None, inverse_after_mean=False)[source]#

Bases: alf.utils.losses._DiscreteRegressionLossBase

A loss for predicting the distribution of a scalar.

The target is assumed to be in the range [-(n-1)//2, n//2], where n=logits.shape[-1]. The logits are used to calculate the probabilities of being greater than or equal to each of these n values. If a target value y is not an integer, it is treated as having prabability mass of $y- \lfloor y \rfloor$ at $\lfloor y \rfloor + 1$ and probability mass of $1 + \lfloor y \rfloor - y$ at $\lfloor y \rfloor$. Then binary cross entropy loss is applied.

More specifically, the logits passed to calc_loss represents the following: P = sigmoid(logits) and P[i] means the probability that the (transformed) target is greater than or equal to i - (n-1)//2

Parameters

transform (Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.
inverse_after_mean – when calculating the expected prediction, whether to do the inverse transformation after calculating the the expectation in the transformed space. Note that using inverse_after_mean=True will make the expectation biased in general. This is because $f^{-1}(E(x)) \le E(f^{-1}(x))$ (Jensen inequality) if $f^{-1}$ is convex. In our case, $f^{-1}$ is convex for $x \ge 0$.

calc_expectation(logits)[source]#

Calculate the expected predition in the untransfomred domain from pred.

Parameters: pred – raw model prediction

initialize_bias(bias)[source]#

Initialize the bias of the last FC layer for the prediction properly.

This function set the bias so that the initial distribution of the prediction in the original domain of target is approximatedly Cauchy: $p(x) \propto \frac{1}{1+x^2}$

Parameters: bias (Tensor) – the bias parameter to be initialized.

class QuantileRegressionLoss(transform=None, inverse_after_mean=False, delta=0.0)[source]#

Bases: alf.utils.losses.ScalarPredictionLoss

Multi-quantile Huber loss

The loss for simultaneous multiple quantile regression. The number of quantiles n is quantiles.shape[-1]. quantiles[..., k] is the quantile value estimation for quantile $(k + 0.5) / n$. For each prediction, there can be one or multiple target values.

This loss is described in the following paper:

Dabney et. al. Distributional Reinforcement Learning with Quantile Regression

Parameters

transform (Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.
inverse_after_mean (bool) – when calculating the expected prediction, whether to do the inverse transformation after calculating the the expectation in the transformed space. Note that using inverse_after_mean=True will make the expectation biased in general. This is because $f^{-1}(E(x)) \le E(f^{-1}(x))$ (Jensen inequality) if $f^{-1}$ is convex. In our case, $f^{-1}$ is convex for $x \ge 0$.
delta (float) – the smoothness parameter for huber loss (larger means smoother). Note that the quantile estimation with delta > 0 is biased. You should use a small value for delta if you want the quantile estimation to be less biased (so that the mean of the quantile will be close to mean of the samples).

calc_expectation(quantiles)[source]#

Calculate the expected predition in the untransfomred domain from pred.

Parameters: quantiles (Tensor) – predicted quantile values in the transformed space.

class ScalarPredictionLoss[source]#

Bases: object

calc_expectation(pred)[source]#: Calculate the expected predition in the untransfomred domain from pred.

initialize_bias(bias)[source]#

Initialize the bias of the last FC layer for the prediction properly.

This function can be passed to FC as bias_initializer.

For some losses (e.g. OrderedDiscreteRegresion), initializing bias to zero can have very bad initial predictions. So we provide an interface for doing loss specific intializations. Note that the weight of the last FC should be initialized to zero in general.

Parameters: bias (Tensor) – the bias parameter to be initialized.

class SquareLoss(transform=None)[source]#

Bases: alf.utils.losses.ScalarPredictionLoss

Square loss for predicting scalar target.

Parameters: transform (Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.

calc_expectation(pred)[source]#

Calculate the expected predition in the untransfomred domain from pred.

Parameters: pred (Tensor) – raw model prediction

element_wise_huber_loss(x, y)[source]#

Elementwise Huber loss.

Parameters

x (Tensor) – label
y (Tensor) – prediction

Returns

loss (Tensor)

element_wise_squared_loss(x, y)[source]#

Elementwise squared loss.

Parameters

x (Tensor) – label
y (Tensor) – prediction

Returns

loss (Tensor)

huber_function(x, delta=1.0)[source]#

Huber function.

Parameters

x (Tensor) – difference between the observed and predicted values
delta (float) – the threshold at which to change between delta-scaled L1 and L2 loss, must be positive. Default value is 1.0

Returns

Huber function (Tensor)

multi_quantile_huber_loss(quantiles, target, delta=0.1)[source]#

Multi-quantile Huber loss

This loss is described in the following paper:

Dabney et. al. Distributional Reinforcement Learning with Quantile Regression

Parameters

quantiles (Tensor) – batch_shape + [num_quantiles,]
target (Tensor) – batch_shape or batch_shape + [num_targets, ]
delta (float) – the smoothness parameter for huber loss (larger means smoother). Note that the quantile estimation with delta > 0 is biased. You should use a small value for delta if you want the quantile estimation to be less biased (so that the mean of the quantile will be close to mean of the samples).

Return type

Tensor

Returns

loss of batch_shape

alf.utils.math_ops#

Various math ops.

class InvertibleTransform[source]#

Bases: object

Base class for InvertibleTransform.

inverse_transform(y)[source]#

transform(x)[source]#

class Log1pTransform(alpha=20)[source]#

Bases: alf.utils.math_ops.InvertibleTransform

Implementing the following transformation:

\[y=\alpha sign(x)\log(1+|x|)\]

Parameters: alpha (float) – $\alpha$ in the above formula

inverse_transform(y)[source]#

transform(x)[source]#

class Softsign[source]#

Bases: torch.autograd.function.Function

Softsign function.

Applies element-wise, the function $\text{SoftSign}(x) = \frac{x}{1 + |x|}$

Compared to Softsign_, this uses more memory but is faster and has higher precision for backward.

static backward(ctx, grad_output)[source]#

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

static forward(ctx, input)[source]#

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store tensors that can be then retrieved during the backward pass.

class Softsign_[source]#

Bases: torch.autograd.function.Function

Inplace version of softsign function.

Applies element-wise inplace, the function $\text{SoftSign}(x) = \frac{x}{1 + |x|}$

The current pytorch implementation of softsign is inefficient for backward because it relies on automatic differentiation and does not have an inplace version. Hence we provide a more efficient implementation.

Reference: PyTorch: Defining New Autograd Functions

static backward(ctx, grad_output)[source]#

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

static forward(ctx, input)[source]#

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

The context can be used to store tensors that can be then retrieved during the backward pass.

class Sqrt1pTransform(*args, **kwargs)[source]#

Bases: alf.utils.math_ops.InvertibleTransform

The transformation used by MuZero with epsilon = 0.

\[y=sign(x) (\sqrt{|x| +1} - 1) = x / (\sqrt{|x|+1} + 1)\]

The second form has better numerical precision for small x.

inverse_transform(y)[source]#

transform(x)[source]#

class SqrtLinearTransform(eps=0.001)[source]#

Bases: alf.utils.math_ops.InvertibleTransform

The transformation used by MuZero.

\[y=sign(x) (\sqrt{|x| +1} - 1) + \epsilon x\]

Parameters: eps (float) – $\epsilon$ in the above formula

inverse_transform(y)[source]#

transform(x)[source]#

add_ignore_empty(x, y)[source]#

Add two Tensors which may be None or ().

If x or y is None, they are assumed to be zero and the other tensor is returned.

Parameters

x (Tensor|None|()) –
y (Tensor(|None|())) –

Returns

x + y

add_n(inputs)[source]#

Calculate the sum of n tensors.

Parameters: inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
Returns: the element-wise sum of all the tensors in inputs.
Return type: Tensor

argmin(x)[source]#

Deterministic argmin.

Different from torch.argmin, which may have undetermined result if the are multiple elements equal to the min, this argmin is guaranteed to return the index of the first element equal to the min in each row.

Parameters: x (Tensor) – only support rank-2 tensor
Returns: rank-1 int64 Tensor represeting the column of the first element in each row equal to the minimum of the row.

binary_neg_entropy(p)[source]#

Negative entropy for binary outcome.

Parameters: p (Tensor) – the probability of one outcome and hence 1-p are the probabilites for the other outcome
Returns: Tensor with the same shape as p

clipped_exp(value, clip_value_min=- 20, clip_value_max=2)[source]#

Clip value to the range [clip_value_min, clip_value_max] then compute exponential

Parameters

value (Tensor) – input tensor.
clip_value_min (float) – The minimum value to clip by.
clip_value_max (float) – The maximum value to clip by.

identity(x)[source]#: PyTorch doesn’t have an identity activation. This can be used as a placeholder.

max_n(inputs)[source]#

Calculate the maximum of n tensors.

Parameters: inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
Returns: the element-wise maximum of all the tensors in inputs.
Return type: Tensor

min_n(inputs)[source]#

Calculate the minimum of n tensors.

Parameters: inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
Returns: the element-wise minimum of all the tensors in inputs.
Return type: Tensor

mul_n(inputs)[source]#

Calculate the product of n tensors.

Parameters: inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
Returns: the element-wise multiplication of all the tensors in inputs.
Return type: Tensor

normalize_min_max(x)[source]#

Normalize the min and max of each sample x[i] to 0 and 1.

normalize x to [0, 1] as suggested in Appendix G. of MuZero paper.

Parameters: x (Tensor) – a batch of samples
Returns: same shape as x
Return type: Tensor

shuffle(values)[source]#

Shuffle a nest.

Shuffle all the tensors in values by a same random order.

Parameters: values (nested Tensor) – nested Tensor to be shuffled. All the tensor need to have the same batch size (i.e. shape[0]).
Returns: shuffled value along dimension 0.

softclip(x, low, high, hinge_softness=1.0)[source]#

Softly bound x in between [low, high]. Unlike softclip_tf, this transform is symmetric regarding the lower and upper bound when squashing. The softclip function can be defined in several forms:

\[\begin{split}\begin{array}{lll} &\ln(\frac{e^{l-x}+1}{e^{x-h}+1}) + x & (1)\\ =&\ln(\frac{e^{x-l}+1}{e^{x-h}+1}) + l & (2)\\ =&\ln(\frac{e^{l-x}+1}{e^{h-x}+1}) + h & (3)\\ \end{array}\end{split}\]

Parameters

x (Tensor) – input
low (float|Tensor) – the lower bound
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from low to high. Default to 1.

softclip_tf(x, low, high, hinge_softness=1.0)[source]#

Softly bound x in between [low, high], namely,

clipped = softupper(softlower(x, low), high)
softclip(x) = (clipped - high) / (high - softupper(low, high)) * (high - low) + high

The second scaling step is because we will have softupper(low, high) < low due to distortion of softplus, so we need to shrink the interval slightly by (high - low) / (high - softupper(low, high)) to preserve the lower bound. Due to this rescaling, the bijector can be mildly asymmetric.

Parameters

x (Tensor) – input
low (float|Tensor) – the lower bound
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from low to high. Default to 1.

softlower(x, low, hinge_softness=1.0)[source]#

Softly lower bound x by low, namely, softlower(x, low) = softplus(x - low) + low

Parameters

x (Tensor) – input
low (float|Tensor) – the lower bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from low to identity. Default to 1.

Returns

Tensor

softsign()#

softsign_()#

softupper(x, high, hinge_softness=1.0)[source]#

Softly upper bound x by high, namely, softupper(x, high) = -softplus(high - x) + high.

Parameters

x (Tensor) – input
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from identity to high. Default to 1.

Returns

Tensor

square(x)[source]#: torch doesn’t have square.

sum_to_leftmost(value, dim)[source]#

Sum out value.ndim-dim many rightmost dimensions of a given tensor.

Parameters

value (Tensor) – A tensor of .ndim at least dim.
dim (int) – The number of leftmost dims to remain.

Returns

The result tensor whose ndim is min(dim, value.dim).

swish(x)[source]#

Swish activation.

This is suggested in arXiv:1710.05941

Parameters: x (Tensor) – input
Returns: Tensor

weighted_reduce_mean(x, weight, dim=())[source]#

Weighted mean.

Parameters

x (Tensor) – values for calculating the mean
weight (Tensor) – weight for x. should have same shape as x
dim (int | tuple[int]) – The dimensions to reduce. If None (the default), reduces all dimensions. Must be in the range [-rank(x), rank(x)). Empty tuple means to sum all elements.

Returns

the weighted mean across axis

alf.utils.normalizers#

class AdaptiveNormalizer(tensor_spec, speed=8.0, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='AdaptiveNormalizer')[source]#

Bases: alf.utils.normalizers.Normalizer

This normalizer gives higher weight to more recent samples for calculating mean and variance. Roughly speaking, the weight for each sample at time t is proportional to (t/T)^(speed-1), where T is the current time step. See docs/streaming_averaging_amd_sampling.py for detail.

Parameters

tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
speed (float) – speed of updating mean and variance.
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –

training: bool#

class EMNormalizer(tensor_spec, update_rate=0.001, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='EMNormalizer')[source]#

Bases: alf.utils.normalizers.Normalizer

Exponential moving normalizer: the normalization assigns exponentially decayed weights to history samples.

Parameters

tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
update_rate (float) – the update rate
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –

training: bool#

class Normalizer(tensor_spec, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, max_dims_to_summarize=10, name='Normalizer')[source]#

Bases: torch.nn.modules.module.Module

Create a base normalizer using a first-moment and a second-moment averagers.

Given weights $w_i$ and samples $x_i, i = 1 \cdots n$, let

\[\begin{split}\begin{array}{lll} m & = \sum_i w_i * x_i \; & \mbox{(first moment)} \\ m2 & = \sum_i w_i * x_i^2 \; & \mbox{(second moment)} \end{array}\end{split}\]

then

\[\begin{split}\begin{array}{ll} var & = \sum_i w_i * (x_i - m)^2 \\ & = \sum_i w_i * (x_i^2 + m^2 - 2*x_i*m) \\ & = m2 + m^2 - 2m^2 \\ & = m2 - m^2 \end{array}\end{split}\]

which is the same result with the case when $w_1=w_2=...=w_n=(1/n)$

NOTE: tf_agents’ normalizer maintains a running average of variance which is not correct mathematically, because the estimated variance contains early components that don’t measure all the current samples.

Parameters

tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epsilon (float) – a small value added to std for normalizing
debug_summaries (bool) – True if debug summaries should be created.
max_dims_to_summarize (int) – when debug_summaries=True, the max number of dims of the normalizer’s statistics will be summarized. Note that a large number could potentially dump a lot of TB plots, consume much disk space, and slow down training. Default: 10.
name (str) –

forward(input)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

normalize(tensor, clip_value=- 1.0)[source]#

Normalize a tensor with mean and variance

Parameters

tensor (nested Tensor) – each leaf can have arbitrary outer dims with shape [B1, B2,…] + tensor_spec.shape.
clip_value (float) – if positive, normalized values will be clipped to [-clip_value, clip_value].

Returns

normalized tensor

training: bool#

update(tensor)[source]#: Update the statistics given a new tensor.

class ScalarAdaptiveNormalizer(speed=8.0, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='ScalarAdaptiveNormalizer')[source]#

Bases: alf.utils.normalizers.AdaptiveNormalizer

Parameters

tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
speed (float) – speed of updating mean and variance.
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –

training: bool#

class ScalarEMNormalizer(update_rate=0.001, auto_update=True, variance_epsilon=1e-10, zero_mean=True, unit_std=False, debug_summaries=False, name='ScalarEMNormalizer')[source]#

Bases: alf.utils.normalizers.EMNormalizer

Args: tensor_spec (nested TensorSpec): specs of the mean of tensors to be

normalized.

update_rate (float): the update rate auto_update (bool): If True, automatically update mean and variance

for each call to normalize(). Otherwise, the user needs to call update()

zero_mean (bool): whether to make the normalized value be zero-mean unit_std (bool): whether assume a unit std or not when normalizing.

If True, then the rewards are just subtracted by the mean.

variance_epislon (float): a small value added to std for normalizing debug_summaries (bool): whether to generate debug summaries name (str):

training: bool#

class ScalarWindowNormalizer(window_size=1000, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='ScalarWindowNormalizer')[source]#

Bases: alf.utils.normalizers.WindowNormalizer

Args: tensor_spec (nested TensorSpec): specs of the mean of tensors to be

normalized.

window_size (int): the size of the recent window auto_update (bool): If True, automatically update mean and variance

for each call to normalize(). Otherwise, the user needs to call update()

zero_mean (bool): whether to make the normalized value be zero-mean unit_std (bool): whether assume a unit std or not when normalizing.

If True, then the rewards are just subtracted by the mean.

variance_epislon (float): a small value added to std for normalizing debug_summaries (bool): whether to generate debug summaries name (str):

training: bool#

class WindowNormalizer(tensor_spec, window_size=1000, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='WindowNormalizer')[source]#

Bases: alf.utils.normalizers.Normalizer

Normalization according to a recent window of samples.

Parameters

tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
window_size (int) – the size of the recent window
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –

training: bool#

alf.utils.per_process_context#

class PerProcessContext[source]#

Bases: object

A singletone that maintains the per process runtime properties.

It is used mainly in multi-process distributed training mode, where properties such as the rank of the process and the total number of processes can be accessed via this interface.

Construct the singleton instance.

This initializes the singleton and default values are assigned to the properties.

property ddp_rank#

finalize()[source]#

Lock the context so that it becomes read only.

Return type: None

property is_distributed#

property num_processes#

property paras_queue: multiprocessing.context.BaseContext.Queue#

Return type: Queue

set_distributed(rank, num_processes)[source]#

Set the distributed properties.

Parameters

rank (int) – the ID of the process
num_processes (int) – the total number of processes

Return type

None

set_paras_queue(paras_queue)[source]#

Set the parameter queue.

The queue is used for checking the consistency of model parameters across different worker processes, if multi-gpu training is used.

alf.utils.plot_tb_curves#

class CurvesPlotter(mean_curves, y_clipping=None, x_range=None, y_range=None, x_ticks=None, x_label=None, y_label=None, x_scaled_and_aligned=False, figsize=(4, 4), dpi=100, linestyle='-', linewidth=2, std_alpha=0.2, colors=None, markers=None, bg_color='white', grid_color='#e6e5e3', plot_mean_only=False, legend_kwargs={'loc': 'best'}, title=None)[source]#

Bases: object

Plot several ``MeanCurve``s in a figure. The curve colors will form a cycle over 10 default colors. The user should make sure that the ``MeanCurve``s to plot are meaningful to be compared in one figure.

For each MeanCurve, its y field will be plotted as the mean, its min_y and max_y will be plotted by a shaded area around y, and its x determines the x-axis range.

Parameters

mean_curves (MeanCurve|list[MeanCurve]) – each MeanCurve should correspond to a different method.
x_range (tuple[float]) – a tuple of (min_x, max_x) for showing on the figure. If None, then (0, 1) will be used. This argument is only used when x_scaled_and_aligned==True.
y_range (tuple[float]) – a tuple of (min_y, max_y) for showing on the figure. If None, then it will be decided according to the y values. Note that this range won’t change y data; it’s only used by matplotlib for drawing y limits.
x_ticks (list[float]) – x ticks shown along x axis
y_clipping (tuple[float]) – the y values will be clipped to this range if not None. Because of smoothing in MeanCurveReader and/or std region, the input y values might be out of this range.
x_label (str) – shown besides x-axis
y_label (str) – shown besides y-axis
x_scaled_and_aligned (bool) –
If True, the x axes of all MeanCurve will be scaled and aligned so that the lower and upper $x$ bounds of all curves will be x_range, and each curve’s $x$ axix will be proportionally scaled. If False, the $x$ axis will be plotted according to $x$ of each MeanCurve as it is. Note that this process only involves $x$ scaling and no interpolation of $y$ values will ever be performed. For example, we have three MeanCurves to be plotted in a figure:
```
mean_curve1 x: (0, 100)
mean_curve2 x: (20, 80)
mean_curve3 x: (100, 200)
```
with x_range==(0,1). Then in the plotted figure, the $x$ range (not x-ticks which can be specified differently!) will be
```
mean_curve1 x: (0, 0.5)
mean_curve2 x: (0.1, 0.4)
mean_curve3 x: (0.5, 1)
```
figsize (tuple[int]) – a tuple of ints determining the size of the figure in inches. A larger figure size will allow for longer texts, more axes or more ticklabels to be shown.
dpi (int) – Dots per inches. How many pixels each inch contains. A figsize of (w,h) consists of w*h*dpi**2 pixels.
linestyle (str|list[str]) – the line style to plot. Possible values: ‘-‘ (‘solid’), ‘–’ (‘dashed’), ‘-.’ (dashdot), and ‘:’ (‘dotted’). If a string, then all curves will have the same style; otherwise each option will apply to the corresponding curve.
linewidth (int) – the thickness of lines to plot. Default: 2.
std_alpha (float) – the transparency value for plotting shaded area around a curve.
bg_color (str) – the background color of the figure
grid_color (str) – color of the dashed grid lines
plot_mean_only (bool) – Whether only plot the mean curve without shaded regions.
legend_kwargs (dict) – kwargs for plotting the legend. If None, then no legend will be plotted.
title (str) – title of the figure

plot(output_path, dpi=200, transparent=False, close_fig=True)[source]#

Plot curves and save the figure to disk.

Parameters

output_path (str) – the output file path
dpi (int) – dpi for the figure. A higher value results in higher resolution.
transparent (bool) – If True, then the figure has a transparent background.
close_fig (bool) – whether to close/release this figure after plotting. If False, the user has to close it manually.

class EnvironmentStepsReturnReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#

Bases: alf.utils.plot_tb_curves.MeanCurveReader

Create a mean curve reader that reads AverageReturn values.

Parameters

event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different $x$ into a MeanCurve. For example, if there are three curves:
```
curve1 x: (1, 9),
curve2 x: (0, 10),
curve3 x: (0, 8),
```
then the merged MeanCurve will have $(1, 8)$ as the final $x$ range. Each curve’s new $y$ values will be interpolated w.r.t. this common $x$ range approperiately given their original $y=f(x)$ curve. The common $x$ range will be automatically determined as in the example if this argument x_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.
name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of (80,85,90,95,99) indicating the confidence level (percentage).

Returns

a mean curve structure.

Return type

MeanCurve

property x_label#

property y_label#

class EnvironmentStepsSuccessReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#

Bases: alf.utils.plot_tb_curves.MeanCurveReader

Create a mean curve reader that reads Success rates.

Parameters

event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different $x$ into a MeanCurve. For example, if there are three curves:
```
curve1 x: (1, 9),
curve2 x: (0, 10),
curve3 x: (0, 8),
```
then the merged MeanCurve will have $(1, 8)$ as the final $x$ range. Each curve’s new $y$ values will be interpolated w.r.t. this common $x$ range approperiately given their original $y=f(x)$ curve. The common $x$ range will be automatically determined as in the example if this argument x_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.
name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of (80,85,90,95,99) indicating the confidence level (percentage).

Returns

a mean curve structure.

Return type

MeanCurve

property x_label#

property y_label#

class IterationsReturnReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#

Bases: alf.utils.plot_tb_curves.MeanCurveReader

Create a mean curve reader that reads AverageReturn values.

Parameters

event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different $x$ into a MeanCurve. For example, if there are three curves:
```
curve1 x: (1, 9),
curve2 x: (0, 10),
curve3 x: (0, 8),
```
then the merged MeanCurve will have $(1, 8)$ as the final $x$ range. Each curve’s new $y$ values will be interpolated w.r.t. this common $x$ range approperiately given their original $y=f(x)$ curve. The common $x$ range will be automatically determined as in the example if this argument x_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.
name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of (80,85,90,95,99) indicating the confidence level (percentage).

Returns

a mean curve structure.

Return type

MeanCurve

property x_label#

property y_label#

class IterationsSuccessReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#

Bases: alf.utils.plot_tb_curves.MeanCurveReader

Create a mean curve reader that reads Success rates.

Parameters

event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different $x$ into a MeanCurve. For example, if there are three curves:
```
curve1 x: (1, 9),
curve2 x: (0, 10),
curve3 x: (0, 8),
```
then the merged MeanCurve will have $(1, 8)$ as the final $x$ range. Each curve’s new $y$ values will be interpolated w.r.t. this common $x$ range approperiately given their original $y=f(x)$ curve. The common $x$ range will be automatically determined as in the example if this argument x_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.
name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of (80,85,90,95,99) indicating the confidence level (percentage).

Returns

a mean curve structure.

Return type

MeanCurve

property x_label#

property y_label#

class MeanCurve(x=None, y=None, min_y=None, max_y=None, ay=None, min_ay=None, max_ay=None, name=None)[source]#

Bases: alf.utils.plot_tb_curves.MeanCurve

Create new instance of MeanCurve(x, y, min_y, max_y, ay, min_ay, max_ay, name)

final_y(N=1)[source]#

classmethod from_curves(x, ys, interval_mode='std', name='MeanCurve')[source]#

Compute various curve statistics from a set of individual curves ys and a common x, and create a class instance.

Parameters

x (np.array) – x steps
ys (list[np.array]) – a list of curves
interval_mode (str) – mode for computing error margin around the mean y curve. Should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of (80,85,90,95,99) indicating the confidence level (percentage).
name (str) –

class MeanCurveGroupReader(mean_curve_readers, task_performance_ranges=None, name='MeanCurveGroupReader')[source]#

Bases: object

Group several MeanCurveReader results. A MeanCurveGroupReader is suitable for one method on multiple tasks, each task with multiple runs. To aggregate across tasks, each task must be provided with a performance range $(y_0, y_1)$ that will be used to normalize performance for that task as $\frac{y - y_0}{y_1 - y_0}$. If the ranges are not provided, no normalization will be done.

The aggregation is simply averaging the statistics of individual MeanCurve.

Parameters

mean_curve_readers (list[MeanCurveReader]) – a list of MeanCurveReader of multiple tasks for one method. It’s the user’s responsibility to ensure that it’s meaningful to group these task event files and show their mean and variance.
task_performance_ranges (list[tuple(float)]) – a list of tuples, where each tuple is a pair of floats used for normalizing the corresponding task. If None, no normalization will be performed.
name (str) – name of the method

property name#

property x_label#

property y_label#

class MeanCurveReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#

Bases: object

Read and compute a MeanCurve from one or multiple TB event files. A MeanCurveReader is suitable for one method on one task with multiple runs.

Parameters

event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different $x$ into a MeanCurve. For example, if there are three curves:
```
curve1 x: (1, 9),
curve2 x: (0, 10),
curve3 x: (0, 8),
```
then the merged MeanCurve will have $(1, 8)$ as the final $x$ range. Each curve’s new $y$ values will be interpolated w.r.t. this common $x$ range approperiately given their original $y=f(x)$ curve. The common $x$ range will be automatically determined as in the example if this argument x_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.
name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of (80,85,90,95,99) indicating the confidence level (percentage).

Returns

a mean curve structure.

Return type

MeanCurve

property name#

property x_label#

property y_label#

ema_smooth(scalars, weight=0.6, speed=64.0, adaptive=False, mode='forward')[source]#

EMA smoothing, following TB’s official implementation: https://github.com/tensorflow/tensorboard/blob/master/tensorboard/components/vz_line_chart2/line-chart.ts#L695

For adaptive EMA, the incoming weight decreases as the time increases.

Parameters

scalars (list[float]) – an array of floats to be smoothed, where the array index represents incoming time steps.
weight (float) – the weight of history. The history is updated as history * weight + scalar * (1 - weight). Only useful when adaptive=False.
speed (int) – an integer number specifying the adpative weight. Only useful when adaptive=True. A higher speed means a smaller average window.
adaptive (bool) – whether use adaptive weighting or not. If True, then later scalars will have smaller incoming weights (proportional to the inverse of array index).
mode (str) – “forward” | “both”. For “forward” mode, the moving average goes from the array beginning to end. For “both” mode, the moving average has an additional backward pass, and the final smoothed value is an average of forward and backward passes.

alf.utils.pretty_print#

class PrettyPrinter(indent=1, width=80, depth=None, stream=None, *, compact=False, sort_dicts=True)[source]#

Bases: pprint.PrettyPrinter

Copied from https://stackoverflow.com/questions/30062384/pretty-print-namedtuple

Handle pretty printing operations onto a stream using a set of configured parameters.

indent: Number of spaces to indent for each level of nesting.
width: Attempted maximum number of columns in the output.
depth: The maximum depth to print out nested structures.
stream: The desired output stream. If omitted (or false), the standard output stream available at construction will be used.
compact: If true, several items will be combined in one line.
sort_dicts: If true, dict keys are sorted.

format_namedtuple(object, stream, indent, allowance, context, level)[source]#

format_namedtuple_items(items, stream, indent, allowance, context, level, inline=False)[source]#

pformat_pycolor(obj)[source]#

alf.utils.process_coordinator#

Coordinate asynchronous training process termination on request.

class Coordinator[source]#

Bases: object

A coordinator for processes.

This class implements a simple mechanism to coordinate the termination of a set of processes.

with coord.stop_on_exception():
    while not coord.should_stop():
        ...do some work...

Create a new Coordinator.

clear_stop()[source]#: Clears the stop flag. After this is called, calls to should_stop() will return False.

join(processes=None, stop_grace_period_secs=120, ignore_live_processes=False)[source]#

Wait for processes to terminate. This call blocks until a set of processes have terminated. The set of process is the union of the processes passed in the processes argument and the list of processes that registered with the coordinator by calling Coordinator.register_process(). After the processes stop, if an exc_info was passed to request_stop, that exception is re-raised. Grace period handling: When request_stop() is called, processes are given ‘stop_grace_period_secs’ seconds to terminate. If any of them is still alive after that period expires, a RuntimeError is raised. Note that if an exc_info was passed to request_stop() then it is raised instead of that RuntimeError. :param processes: The started processes to join in

addition to the registered processes.

Parameters

stop_grace_period_secs – Number of seconds given to processes to stop after request_stop() has been called.
ignore_live_processes – If False, raises an error if any of the processes are still alive after stop_grace_period_secs.

Raises

RuntimeError – If any process is still alive after request_stop() is called and the grace period expires.

property joined#

raise_requested_exception()[source]#: If an exception has been passed to request_stop, this raises it.

register_process(process)[source]#: Register a process to join. :param process: A python.multiprocessing.Process to join.

request_stop(ex=None)[source]#

Request that the processes stop.

After this is called, calls to should_stop() will return True. Note: If an exception is being passed in, in must be in the context of handling the exception (i.e. try: ... except Exception as ex: ...) and not a newly created one.

Parameters

ex (Exception or exc_info tuple) – Optional Exception, or
exc_info tuple as returned by sys.exc_info() (Python) –
this is the first call to request_stop() the (If) –
exception is recorded and re-raised from join() (corresponding) –

should_stop()[source]#: Check if stop was requested. :returns: True if a stop was requested.

stop_on_exception()[source]#

Context manager to request stop when an Exception is raised. Code that uses a coordinator must catch exceptions and pass them to the request_stop() method to stop the other processes managed by the coordinator. This context handler simplifies the exception handling. Use it as follows:

with coord.stop_on_exception():
    # Any exception raised in the body of the with
    # clause is reported to the coordinator before terminating
    # the execution of the body.
    ...body...

This is completely equivalent to the slightly longer code:

try:
    ...body...
except:
    coord.request_stop(sys.exc_info())

Yields: nothing.

wait_for_stop(timeout=None)[source]#

Wait till the Coordinator is told to stop. :param timeout: Float. Sleep for up to that many seconds waiting for

should_stop() to become True.

Returns: True if the Coordinator is told stop, False if the timeout expired.

class Process(coord, target=None, args=(), kwargs={})[source]#

Bases: multiprocessing.context.Process

A coordinated process class to execute acting loops.

Creates a process, running target in a loop, managed by coordinator.

Parameters

coord (Coordinator) – coordinator used to manage this new process.
target (callable) – to be invoked by run() in a loop, until coordinator tells the process to stop.
args (list) – optional arguments for target callable.
kwargs (dict) – optional keyword arguments for target callable.

body(args=(), kwargs={})[source]#

run()[source]#: Method to be run in sub-process; can be overridden in sub-class

run_loop()[source]#: Called in a back to back loop.

start_loop()[source]#: Called when the process starts.

stop_loop()[source]#: Called when the process stops.

alf.utils.schedulers#

Schedulers.

class ConstantScheduler(value)[source]#: Bases: object

class CyclicalScheduler(progress_type, base_lr, bound_lr, half_cycle_size, switch_mode='step')[source]#

Bases: alf.utils.schedulers.Scheduler

The cyclical scheduler where the value changes cyclically between two bounds. Reference:

Leslie N. Smith Cyclical Learning Rates for Training Neural Networks, 2017
(https://arxiv.org/pdf/1506.01186.pdf)

This implementation generalizes the original methods in two ways: 1) the initial value can start from either the lower-bound (as in the original method), or upper bound; 2) apart from the linear switching between the bounds, we also support step mode of switching.

In terms of applications, beyond the standard case of using a cyclical learning rate to improve the learning behavior during NN training, this scheduler is also useful in other cases. One example is in reinforcement learning, sometimes we want to update the parameters of different modules at difference paces. For example, in TD3, we want to update the policy every other updates. In this case, we can use a CyclicalScheduler with step switching mode to achieve this. Similar cases also appears in Dreamer.

Parameters

progress_type (str) – one of “percent”, “iterations”, “env_steps”
base_lr (float) – the base learning rate, representing the starting value.
bound_lr (float) – the value of the learning rate on the other bound. The value of bound_lr could be either larger or smaller than the value of base_lr.
half_cycle_size (int|float) – the length of half a cycle. Its actual length is based on the progress_type. For example, if in “iterations” mode, it means the lr value will reach the opposite bound every half_cycle_size iterations.
switch_mode (str) – the way to switch from one bound to the other. Currently support the following modes: - step: directly jump from one mode to the other every half cycle - linear: linearly move from one mode to the other every half cycle

class ExponentialScheduler(progress_type, initial_value, decay_rate, decay_time)[source]#

Bases: alf.utils.schedulers.Scheduler

The value is exponentially decayed based on the progress.

The value is calculated as initial_value * decay_rate**(progress/decay_time) :param progress_type: one of “percent”, “iterations”, “env_steps” :type progress_type: str :param initial_value: initial value :type initial_value: float :param decay_rate: :type decay_rate: float :param decay_time: :type decay_time: float

class LinearScheduler(progress_type, schedule)[source]#

Bases: alf.utils.schedulers.Scheduler

The value is linearly changed in each defined region of progress.

Parameters

progress_type (str) – one of “percent”, “iterations”, “env_steps”
schedule (list[tuple]) – each tuple is a pair of (progress, value) which means that if the current progress between progress[i-1] and progress[i], a linear interpolation between value[i-1] and value[i] will be used. progress[0] must be 0. If the current progress is greater than progress[-1], value[-1] will be used.

class Scheduler(progress_type)[source]#

Bases: object

Base class of all schedulers.

A scheduler is used to generate manually defined values based on the training progress.

The subclass should call progress() to get the current training progress and use it to calculate the scheduled value. There are three types of training progresses:

“percent”: percent of training completed.
“iterations”: the number training iterations.
“env_steps”: the number of environment steps
“global_counter”: the value from alf.summary.get_global_counter()

Parameters: progress_type (str) – one of “percent”, “iterations”, “env_steps”

progress()[source]#

class StepScheduler(progress_type, schedule, warm_up_period=0, start=0)[source]#

Bases: alf.utils.schedulers.Scheduler

There is one value for each defined region of training progress.

Parameters

progress_type (str) – one of “percent”, “iterations”, “env_steps”
schedule (list[tuple]) – each tuple is a pair of (progress, value) the scheduled result will be the value of the smallest progress such that it is greater than the current training progress.
warm_up_period (Number) – linearly increasing the output value from 0 to the first value (i.e schedule[0][0]) for a duration of warm_up_period starting from start. The value before start will be 0.
start (Number) – see warm_up_period

as_scheduler(value_or_scheduler)[source]#

alf.utils.sl_utils#

Supervised learning utilities.

auc_score(inliers, outliers)[source]#

Computes the AUROC score w.r.t network outputs on two distinct datasets. Typically, one dataset is the main training/testing set, while the second dataset represents a set of unseen outliers.

Parameters

inliers (torch.tensor) – set of predictions on inlier data
outliers (torch.tensor) – set of predictions on outlier data

Returns

AUROC score (float)

classification_loss(output, target)[source]#

Computes the cross entropy loss with respect to a batch of predictions and targets.

Parameters

output (Tensor) – predictions of shape [B, D] or [B, N, D].
target (Tensor) – targets of shape [B], [B, 1], [B, N], or [B, N, 1].

Returns

LossInfo containing the computed cross entropy loss and the average: accuracy.

predict_dataset(model, testset)[source]#

Computes predictions for an input dataset.

Parameters

model (Callable) – model with which to compute predictions.
testset (torch.utils.data.DataLoader) – dataset for which to compute predictions.

Returns

a tensor of shape [N, S, D] where: N refers to the number of predictors, S is the number of data points, and D is the output dimensionality.

Return type

model_outputs (torch.tensor)

regression_loss(output, target)[source]#

Computes the MSE loss with respect to a batch of predictions and targets.

Parameters

output (Tensor) – predictions of shape [B, 1] or [B, N, 1]
target (Tensor) – targets of shape [B, 1] or [B, N, 1]

Returns

LossInfo containing the computed MSE loss

alf.utils.spec_utils#

Collection of spec utility functions.

clip_to_spec(value, spec)[source]#

Clips value to a given bounded tensor spec. :param value: (tensor) value to be clipped. :type spec: BoundedTensorSpec :param spec: (BoundedTensorSpec) spec containing min and max values for clipping.

Returns: (tensor) value clipped to be compatible with spec.
Return type: clipped_value

is_same_spec(spec1, spec2)[source]#

Whether two nested specs are same.

Parameters

spec1 (nested TensorSpec) – the first spec
spec2 (nested TensorSpec) – the second spec

Returns

bool

scale_to_spec(tensor, spec)[source]#

Shapes and scales a batch into the given spec bounds.

Parameters

tensor – A tensor with values in the range of [-1, 1].
spec (BoundedTensorSpec) – (BoundedTensorSpec) to use for scaling the input tensor.

Returns

A batch scaled the given spec bounds.

spec_means_and_magnitudes(spec)[source]#

Get the center and magnitude of the ranges for the input spec.

Parameters: spec (BoundedTensorSpec) – the spec used to compute mean and magnitudes.
Returns: the mean value of the spec bound. spec_magnitudes (Tensor): the magnitude of the spec bound.
Return type: spec_means (Tensor)

zeros_from_spec(nested_spec, batch_size)[source]#

Create nested zero Tensors or Distributions.

A zero tensor with shape[0]=`batch_size is created for each TensorSpec and A distribution with all the parameters as zero Tensors is created for each DistributionSpec.

Parameters

nested_spec (nested TensorSpec or DistributionSpec) –
batch_size (int|tuple|list) – batch size/shape added as the first dimension to the shapes in TensorSpec

Returns

nested Tensor or Distribution

alf.utils.summary_utils#

Utility functions for generate summary.

add_mean_hist_summary(name, value)[source]#

Generate mean and histogram summary of value.

Parameters

name (str) – name of the summary
value (Tensor) – tensor to be summarized

add_mean_summary(name, value)[source]#

Generate mean summary of value.

Parameters

name (str) – name of the summary
value (Tensor) – tensor to be summarized

add_nested_summaries(prefix, data)[source]#

Add summary of a nest of data.

Parameters

prefix (str) – the prefix of the names of the summaries
data (dict or namedtuple) – data to be summarized

histogram_continuous(name, data, bucket_min=None, bucket_max=None, bucket_count=30, step=None)[source]#

histogram for continuous data.

Parameters

name (str) – name for this summary
data (Tensor) – A Tensor of any shape.
bucket_min (float|None) – represent bucket min value, if None value, data.min() will be used
bucket_max (float|None) – represent bucket max value, if None value, data.max() will be used
bucket_count (int) – positive int. The output will have this many buckets.
step (None|Tensor) – step value for this summary. this defaults to alf.summary.get_global_counter()

histogram_discrete(name, data, bucket_min, bucket_max, step=None)[source]#

histogram for discrete data.

Parameters

name (str) – name for this summary
data (Tensor) – A Tensor integers of any shape.
bucket_min (int) – represent bucket min value
bucket_max (int) – represent bucket max value bucket count is calculate as bucket_max - bucket_min + 1 and output will have this many buckets.
step (None|Tensor) – step value for this summary. this defaults to alf.summary.get_global_counter()

class record_time(tag)[source]#

Bases: object

A context manager for record the time.

It records the average time spent under the context between two summaries.

Example:

with record_time("time/calc"):
    long_function()

Create a context object for recording time.

By default, record_time will do cuda.synchronize() before entering and after leaving the context to measure the time accurately. This behavior can be disabled by setting environment variable ALF_RECORD_TIME_SYNC to 0 if you suspect synchronization slow down your code. See https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution.

Parameters: tag (str) – the summary tag for the the time.

safe_mean_hist_summary(name, value, mask=None)[source]#

Generate mean and histogram summary of value.

It skips the summary if value is empty.

Parameters

name (str) – name of the summary
value (Tensor) – tensor to be summarized
mask (bool Tensor) – optional mask to indicate which element of value to use. Its shape needs to be same as that of value

safe_mean_summary(name, value, mask=None)[source]#

Generate mean summary of value.

It skips the summary if value is empty.

Parameters

name (str) – name of the summary
value (Tensor) – tensor to be summarized
mask (bool Tensor) – optional mask to indicate which element of value to use. Its shape needs to be same as that of value

summarize_action(actions, action_specs, name='action')[source]#

Generate histogram summaries for actions.

Actions whose rank is more than 1 will be skipped.

Parameters

actions (nested Tensor) – actions to be summarized
action_specs (nested TensorSpec) – spec for the actions
name (str) – name of the summary

summarize_distribution(name, distributions)[source]#

Generate summary for distributions.

Currently the following types of distributions are supported:

Normal, StableCauchy, Beta: mean and std of each dimension will be summarized
Above distribution wrapped by Independent and TransformedDistribution: the base distribution is summarized
Tensor: each dimenstion dist[…, a] will be summarized

Note that unsupported distributions will be ignored (no error reported).

Parameters

name (str) – name of the summary
distributions (nested td.distribuation.Distribution) – distributions to be summarized.

summarize_distribution_gradient(name, distribution, batch_dims=1, clone=False)[source]#

Summarize the gradient of the parameters of distribution during backward.

Parameters

name (str) – name of the summary
distribution (nested Distribution) – distribution of which the gradient is to be summarized.
batch_dims (int) – first so many dimensions are treated as batch dimensions
clone (bool) – If True, distribution will first be cloned. This is useful if distribution is used in multiple places and you only want to summarize the gradient from one place. If False, the gradient will be the sum from all gradients backpropped to distribution.

Returns

the cloned distribution: should be used for the downstream calculations.

Return type

distribution or cloned distribution

summarize_gradients(name_and_params, with_histogram=True)[source]#

Add summaries for gradients.

Parameters

name_and_params (list[(str, Parameter)]) – A list of (name, Parameter) tuples.
with_histogram (bool) – If True, generate histogram.

summarize_loss(loss_info)[source]#

Add summary about loss_info

Parameters: loss_info (LossInfo) – loss_info.extra must be a namedtuple

summarize_nest(prefix, nest)[source]#

summarize_per_category_loss(loss_info, summarize_count=False, label_names=None)[source]#

Add summary about each category of the unaggregated loss_info.loss of the shape (T, B), or (B, ) by partitioning it according to loss_info.batch_label, which has the same shape as loss_info.loss. It also creates summarization of the number of samples encountered for each category.

Parameters

loss_info (LossInfo) – do per-category summarization if
is present, and skip otherwise (loss_info.batch_label) –
summarize_count (bool) – whether to summarize the number of samples for each category as well
label_names (Optional[List[str]]) – the names of each category to be used in tensorboard summary. The category number will be used if label_names is None.

summarize_tensor_gradients(name, tensor, batch_dims=1, clone=False)[source]#

Summarize the gradient of tensor during backward.

Parameters

name (str) – name of the summary
tensor (nested Tensor) – tensor of which the gradient is to be summarized.
batch_dims (int) – first so many dimensions are treated as batch dimensions
clone (bool) – If True, tensor will first be cloned. This is useful if tensor is used in multiple places and you only want to summarize the gradient from one place. If False, the gradient will be the sum from all gradients backpropped to tensor.

Returns

the cloned tensor should be used for: the downstream calculations.

Return type

tensor or cloned tensor

summarize_variables(name_and_params, with_histogram=True)[source]#

Add summaries for variables.

Parameters

name_and_params (list[(str, Parameter)]) – A list of (name, Parameter) tuples.
with_histogram (bool) – If True, generate histogram.

alf.utils.tensor_utils#

Collection of tensor utility functions.

class BatchSquash(batch_dims)[source]#

Bases: object

Facilitates flattening and unflattening batch dims of a tensor. Copied from tf_agents.

Exposes a pair of matched flatten and unflatten methods. After flattening only 1 batch dimension will be left. This facilitates evaluating networks that expect inputs to have only 1 batch dimension.

Create two tied ops to flatten and unflatten the front dimensions.

Parameters: batch_dims (int) – Number of batch dimensions the flatten/unflatten ops should handle.
Raises: ValueError – if batch dims is negative.

flatten(tensor)[source]#: Flattens and caches the tensor’s batch_dims.

unflatten(tensor)[source]#: Unflattens the tensor’s batch_dims using the cached shape.

append_coordinate(im)[source]#

For the image, we append coordinates as two channels. The image is assumed to be channel-first. The coordinates will range from -1 to 1 evenly.

Parameters

im (Tensor) – an image of shape [B,C,H,W].

Returns

an output image of shape [B,C+2,H,W] where the extra 2: dimensions are xy meshgrid from -1 to 1.

Return type

torch.Tensor

clip_by_global_norm(tensors, clip_norm, use_norm=None, in_place=False)[source]#

Clips values of multiple tensors by the ratio of clip_norm to the global norm.

Adapted from TF’s version.

Given a nest of tensors tensors, and a clipping norm threshold clip_norm, this function clips the tensors in place and returns the global norm (global_norm) of all tensors in tensors. Optionally, if you’ve already computed the global norm for tensors, you can specify the global norm with use_norm.

To perform the clipping, each tensor are set to:

tensor * clip_norm / max(global_norm, clip_norm)

where:

global_norm = sqrt(sum([l2norm(t)**2 for t in tensors]))

If clip_norm > global_norm then the entries in tensors remain as they are, otherwise they’re all shrunk by the global ratio.

Any of the entries of tensors that are of type None are ignored.

Parameters

tensors (nested Tensor) – a nest of tensors to be clipped
clip_norm (float or Tensor) – a positive floating scalar
use_norm (float or Tensor) – the global norm to use. If None, global_norm() will be used to compute the norm.
in_place (bool) – If True, then the input tensors will be changed. For tensors that require grads, we cannot modify them in place; on the other hand, if you are clipping the gradients hold by an optimizer, then probably doing this in place will be easier.

Returns

the clipped tensors global_norm (Tensor): a scalar tensor representing the global norm. If

use_norm is provided, it will be returned instead.

Return type

tensors (nested Tensor)

clip_by_norms(tensors, clip_norm, in_place=False)[source]#

Clipping a nest of tensors in place to a maximum L2-norm.

Given a tensor, and a maximum clip value clip_norm, this function normalizes the tensor so that its L2-norm is less than or equal to clip_norm.

To perform the clipping:: tensor * clip_norm / max(l2norm(tensor), clip_norm)

Parameters

tensors (nested Tensor) – a nest of tensors
clip_norm (float or Tensor) – a positive scalar
in_place (bool) – If True, then the input tensors will be changed. For tensors that require grads, we cannot modify them in place; on the other hand, if you are clipping the gradients hold by an optimizer, then probably doing this in place will be easier.

Returns

the clipped tensors

cov(data, rowvar=False)[source]#

Estimate a covariance matrix given data.

Parameters

data (tensor) – A 1-D or 2-D tensor containing multiple observations of multiple dimensions. Each row of mat represents a dimension of the observation, and each column a single observation.
rowvar (bool) – If True, then each row represents a dimension, with observations in the columns. Othewise, each column represents a dimension while the rows contains observations.

Returns

The covariance matrix

explained_variance(ypred, y, valid_mask=None, dim=None)[source]#

Computes fraction of variance that ypred explains about y.

Adapted from baselines.ppo2 explained_variance()

Interpretation:

ev=0: might as well have predicted zero

ev=1: perfect prediction

ev<0: worse than just predicting zero

Parameters

ypred (Tensor) – prediction for y
y (Tensor) – target
valid_mask (Tensor) – an optional
dim (None|int) – the dimension to reduce. If not provided, the explained variance is calculated for all dimensions.

Returns

1 - Var[y-ypred] / Var[y]

global_norm(tensors)[source]#

Computes the global norm of a nest of tensors.

Adapted from TF’s version.

Given a nest of tensors tensors, this function returns the global norm of all tensors in tensors. The global norm is computed as:

global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))

Any entries in tensors that are of type None are ignored.

Parameters: tensors (nested Tensor) – a nest of tensors
Returns: a scalar tensor
Return type: norm (Tensor)

reverse_cumprod(x, dim)[source]#

Perform cumprod in a reverse order along the dimension specified by dim.

Parameters

x (Tensor) – the tensor to compute the reverse cumprod on
dim (int) – the value indicating the dimension along which to calculate the reverse cumprod

Returns

the reverse cumprod tensor. It has the same shape as x.

reverse_cumsum(x, dim)[source]#

Perform cumsum in a reverse order along the dimension specified by dim.

Parameters

x (Tensor) – the tensor to compute the reverse cumsum on
dim (int) – the value indicating the dimension along which to calculate the reverse cumsum

Returns

the reverse cumsumed tensor. It has the same shape as x.

scale_gradient(tensor, scale, clone_input=True)[source]#

Scales the gradient of tensor for the backward pass. :param tensor: a tensor which requires gradient. :type tensor: Tensor :param scale: a scalar factor to be multiplied to the gradient

of tensor.

Parameters: clone_input (bool) – If True, clone the input tensor before applying gradient scaling. This option is useful when there are multiple computational branches originated from tensor and we want to apply gradient scaling to part of them without impacting the rest. If False, apply gradient scaling to the input tensor directly.
Returns: The (cloned) tensor with gradient scaling hook registered.

spatial_broadcast(z, im_shape)[source]#

Broadcasting an embedding across the image spatial domain. The image shape is assumed to be channel-first.

Parameters

z (Tensor) – embedding of shape [...,D] to be broadcast spatially
im_shape (Tuple[int]) – a tuple of ints where the last two are height and width.

Returns

a broadcast image of spec [...,D,H,W] where D is the: input embedding size and [H,W] are input height and width.

Return type

torch.Tensor

tensor_extend(x, y)[source]#

Extending tensor x with new_slice y.

y.shape should be same as x.shape[1:]

Parameters

x (Tensor) – tensor to be extended
y (Tensor) – the tensor which will be appended to x

Returns

the extended tensor. Its shape is (x.shape[0]+1, x.shape[1:])

Return type

Tensor

tensor_extend_new_dim(x, dim, n)[source]#

Extending the tensor along a new dimension with a replica of n.

Parameters

x (Tensor) – tensor to be extended
dim (int) – the value indicating the position of the newly inserted dimension
n (int) – the number of replica along dim

Returns

the extended tensor. Its shape is (*x.shape[0:dim], n, *x.shape[dim:])

Return type

Tensor

tensor_extend_zero(x, dim=0)[source]#

Extending tensor with zeros along an axis.

Parameters

x (Tensor) – tensor to be extended
dim (int) – the axis to extend zeros

Returns

the extended tensor. Its shape is: (*x.shape[:dim], x.shape[dim]+1, *x.shape[dim+1:])

Return type

Tensor

tensor_prepend(x, y)[source]#

Prepending tensor with y.

y.shape should be same as tensor.shape[1:] :param x: tensor to be prepended :type x: Tensor :param y: the tensor which will be appended to x :type y: Tensor

Returns: the prepended tensor. Its shape is (x.shape[0]+1, x.shape[1:])
Return type: Tensor

tensor_prepend_zero(x)[source]#

Prepending tensor with zeros.

Parameters: x (Tensor) – tensor to be extended
Returns: ])
Return type: the prepended tensor. Its shape is (x.shape[0]+1, x.shape[1

to_tensor(data, dtype=None)[source]#

Convert the data to a torch tensor.

Parameters

data (array like) – data for the tensor. Can be a list, tuple, numpy ndarray, scalar, and other types.
dtype (torch.dtype) – dtype of the converted tensors.

Returns

A tensor of dtype

alf.utils.value_ops#

Various functions related to calculating values.

action_importance_ratio(action_distribution, rollout_action_distribution, action, clipping_mode, scope, importance_ratio_clipping, log_prob_clipping, check_numerics, debug_summaries, rollout_log_prob=None)[source]#

ratio for importance sampling, used in PPO loss and vtrace loss.

Caller has to save alf.summary.scope() and pass scope to this function.

Parameters

action_distribution (nested td.distribution) – Distribution over actions under target policy.
rollout_action_distribution (nested td.distribution) – distribution over actions from behavior policy, used to sample actions for the rollout.
action (nested tensor) – possibly batched action tuple taken during rollout.
clipping_mode (str) –
mode for clipping the importance ratio:
- ’double_sided’: clips the range of importance ratio into [1-importance_ratio_clipping, 1+importance_ratio_clipping], which is used by PPOLoss.
- ’capping’: clips the range of importance ratio into min(1+importance_ratio_clipping, importance_ratio), which is used by VTraceLoss, where c_bar or rho_bar = 1+importance_ratio_clipping.
scope (name scope manager) – returned by alf.summary.scope(), set outside.
importance_ratio_clipping (float) – Epsilon in clipped, surrogate PPO objective. See the cited paper for more detail.
log_prob_clipping (float) – If >0, clipping log probs to the range (-log_prob_clipping, log_prob_clipping) to prevent inf / NaN values.
check_numerics (bool) – If true, adds checks to help find NaN/Inf values. For debugging only.
debug_summaries (bool) – If true, output summary metrics to tensorboard.
rollout_log_prob (nested tensor) – the log probability of the action

Returns

importance_ratio (Tensor), importance_ratio_clipped (Tensor).

discounted_return(rewards, values, step_types, discounts, time_major=True)[source]#

Computes discounted return for the first T-1 steps.

The difference between this function and the one tf_agents.utils.value_ops is that the accumulated_discounted_reward is replaced by value for is_last steps in this function.

\[Q_t = \sum_{t'=t}^T \gamma^{t'-t} * r_{t'} + \gamma^{T-t+1}*final\_value.\]

Define abbreviations:

B: batch size representing number of trajectories
T: number of steps per trajectory

Parameters

rewards (Tensor) – shape is [T, B] (or [T]) representing rewards.
values (Tensor) – shape is [T, B] (or [T]) when representing values, [T, B, n_quantiles] or [T, n_quantiles] when representing quantiles of value distributions.
step_types (Tensor) – shape is [T, B] (or [T]) representing step types.
discounts (Tensor) – shape is [T, B] (or [T]) representing discounts.
time_major (bool) – Whether input tensors are time major. False means input tensors have shape [B, T].

Returns

A tensor with shape [T-1, B] (or [T-1]) representing the discounted returns. Shape is [B, T-1] when time_major is false.

generalized_advantage_estimation(rewards, values, step_types, discounts, td_lambda=1.0, time_major=True)[source]#

Computes generalized advantage estimation (GAE) for the first T-1 steps.

For theory, see “High-Dimensional Continuous Control Using Generalized Advantage Estimation” by John Schulman, Philipp Moritz et al. See https://arxiv.org/abs/1506.02438 for full paper.

The difference between this function and the one tf_agents.utils.value_ops is that the accumulated_td is reset to 0 for is_last steps in this function.

Define abbreviations:

B: batch size representing number of trajectories
T: number of steps per trajectory

Parameters

rewards (Tensor) – shape is [T, B] (or [T]) representing rewards.
values (Tensor) – shape is [T,B] (or [T]) representing values.
step_types (Tensor) – shape is [T,B] (or [T]) representing step types.
discounts (Tensor) – shape is [T, B] (or [T]) representing discounts.
td_lambda (float) – A scalar between [0, 1]. It’s used for variance reduction in temporal difference.
time_major (bool) – Whether input tensors are time major. False means input tensors have shape [B, T].

Returns

A tensor with shape [T-1, B] representing advantages. Shape is [B, T-1] when time_major is false.

one_step_discounted_return(rewards, values, step_types, discounts)[source]#

Calculate the one step discounted return for the first T-1 steps.

return = next_reward + next_discount * next_value if is not the last step; otherwise will set return = current_discount * current_value.

Note: Input tensors must be time major :param rewards: shape is [T, B] (or [T]) representing rewards. :type rewards: Tensor :param values: shape is [T, B] (or [T]) when representing values,

[T, B, n_quantiles] or [T, n_quantiles] when representing quantiles of value distributions.

Parameters

step_types (Tensor) – shape is [T, B] (or [T]) representing step types.
discounts (Tensor) – shape is [T, B] (or [T]) representing discounts.

Returns

A tensor with shape [T-1, B] (or [T-1]) representing the discounted returns.

alf.utils.video_recorder#

class VideoRecorder(env, frame_max_width=2560, frames_per_sec=None, last_step_repeats=0, append_blank_frames=0, **kwargs)[source]#

Bases: gym.wrappers.monitoring.video_recorder.VideoRecorder

A video recorder that renders frames and encodes them into a video file. Besides rendering frames, it also supports plotting prediction info. Each algorithm is responsible for adding rendered Image instances in its pred info in order to be recorded here. See the docstring in alf.summary.render for more details.

Parameters

env (Gym.env) –
frame_max_width (int) – the max width of a video frame. Scale if the original width is bigger than this.
frames_per_sec (fps) – if None, use fps from the env
last_step_repeats (int) – repeat such number of times for the last frame of each episode.
append_blank_frames (int) – If >0, will append such number of blank frames at the end of the episode in the rendered video file. A negative value has the same effects as 0 and no blank frames will be appended.

cache_frame_and_pred_info(frame, pred_info=None)[source]#

Cache the input frame and pred_info for video generation later.

Parameters

frame (np.array) – the environmental frame.
pred_info (None|nest) – prediction step info for displaying: any Image instance in the info nest will be recorded.

capture_env_frame()[source]#: Return un-encoded env frame

capture_frame(pred_info=None, is_last_step=False)[source]#

Render self.env and add the resulting frame to the video. Also plot Image instances extracted from prediction info of policy_step.

Parameters

pred_info (None|nest) – prediction step info for displaying: any Image instance in the info nest will be recorded.
is_last_step (bool) – whether the current time step is the last step of the episode, either due to game over or time limits.

clear_cache()[source]#: Clear the cached contents.

generate_video_from_cache()[source]#: Generate the video from the cached frames. Also add the plot Image instances extracted from cached prediction info. The cache will be reset to empty afterwards.

alf.utils.visualizer#

Various functions related to visualizations of networks etc.

critic_network_visualizer(net, observation, action_upper_left, action_upper_right, action_lower_left, H=20, W=20, batch_size=None)[source]#

Generate a batched network response image within the rectangular range of actions (referred to as probing region) specified by action_top_left, action_top_right, action_bottom_left as shown below:

action_upper_left —–> action_upper_right: |

|

v |

action_lower_left—- action_lower_right

where action_lower_right is computed from the three provided points as the following because of the rectangular assumption:

action_lower_right = (action_upper_right + action_lower_left - action_upper_left)

Example usage:

# assume a case where the dimensionality of action is 4
# the action for the upper-left point of the probing region
action_upper_left = torch.Tensor([1, -1, 0, 0])
# the action for the upper-right point of the probing region
action_upper_right = torch.Tensor([1, 1, 0, 0])
# the action for the lower-left point of the probing region
action_lower_left = torch.Tensor([-1, -1, 0, 0])

# define a network function
def net_func(net_input):
    critics, _ = self._critic_networks(
        net_input)  # [B, replicas * reward_dim]
    critics = critics.reshape(  # [B, replicas, reward_dim]
        -1, self._num_critic_replicas, *self._reward_spec.shape)
    critics = critics.min(dim=1)[0]
    return critics

img = critic_network_visualizer(net_func, inputs.observation,
                        action_upper_left, action_upper_right,
                        action_lower_left,
                        20, 20)

# visualize the first response image in the batch
data = img[0, ...].squeeze(0)
data = data.cpu().numpy()

import alf.summary.render as render
val_img = render.render_heatmap(name="val_img", data=data)

Parameters

net (Callable) – a callable that is called as``net((obsevation, actions))``
observation (Tensor) – [B, …]
action_upper_left (tensor) – tensor representing the upper-left point of the probing region, with the shape of [action_dim]
action_upper_right (tensor) – a tensor representing the upper-right point of the probing region, with the shape of [action_dim]
action_lower_left (tensor) – a tensor representing the lower-left point of the probing region, with the shape of [action_dim]
H (int) – number of samples to be used for creating visualization along the direction of action_lower_left - action_upper_left.
W (int) – number of samples to be used for creating visualization along the direction of action_upper_right - action_upper_left. The total number of samples is H * W.
batch_size (int) – the batch size of the input observation. If None, will be inferred from the input observation.

Returns

The network response image of the shape [B, K, H, W], where K denotes the dimensionality of the network output for the non-batch dimension.