alf.utils#
alf.utils.action_quantizer#
Action Quantizer.
- class ActionQuantizer(action_spec, sampling_method='uniform', action_bins=7, rep_mode='center')[source]#
Bases:
objectQuantize actions in a specified way.
- Parameters
action_spec (BoundedTensorSpec) – action spec
sampling_method (str) –
sampling space, uniform or log space:
”uniform”: the original space
”log”: the logarithm space
action_bins (int) – number of bins used for discretization
rep_mode (str) –
the mode of representation for quantization:
”center”: linspace(lb + bin-size/2, ub - bin_size/2, bin_num)
”boundary”: linspace(lower_bound, upper_bound, bin_num)
- property action_bins#
alf.utils.action_samplers#
- class CategoricalSeedSampler(num_classes, new_noise_prob=0.01, concentration=1)[source]#
Bases:
alf.utils.action_samplers._CategoricalSeedSamplerBaseSample actions with temporal consistency.
In order to do so, we maintain an internal stateful noise vector \(\epsilon\) and use it to modify the original categorical distribution \(\pi\) to a new distribution \(\tilde{\pi}=f(\pi, \epsilon)\). The evolution of \(\epsilon\) and \(f\) are chosen so that \(E(\tilde{\pi})=\pi\). More specifically, \(f\) is chosen so that \(\tilde{\pi}\) follows Dirichlet distribution \(Dir(c \pi)\).
- Parameters
num_classes (
int) – number of classes for the categorical distributionnew_noise_prob (
float) – the probability of generating a new \(\epsilon\)concentration (
float) – the concentration scaling factor c. Largerconcentrationtends to generate \(\tilde{\pi}\) closer to \(\pi\).
Args: input_tensor_spec (nested TensorSpec): the (nested) tensor spec of
the input.
- state_spec (nested TensorSpec): the (nested) tensor spec of the state
of the network.
name (str):
- forward(input, state)[source]#
- Parameters
input (
Tensor) – the parameter of the categorical distribution with the shape of[batch_size, num_classes]state (
Tensor) – noise state (i.e. \(\epsilon\))
- training: bool#
- class EpsilonGreedySampler(epsilon_greedy=0.1)[source]#
Bases:
torch.nn.modules.module.ModuleEpsilon greedy sampler.
With probability
1 - epsilon_greedy, sample actions with the largest probability. With probabilityepsilon_greedy, sample actions according to the given categorical distribution.- Parameters
epsilon_greedy – see above.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
- Parameters
input – categorical probabilities with the shape of
[batch_size, num_classes]
- training: bool#
- class MultinomialSampler[source]#
Bases:
torch.nn.modules.module.ModuleSample actions according to the given multinomial distribution.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(input)[source]#
- Parameters
input – categorical probabilities with the shape of
[batch_size, num_classes]
- training: bool#
alf.utils.averager#
Classes for doing moving average.
- class AdaptiveAverager(tensor_spec, speed=10.0, name='AdaptiveAverager')[source]#
Bases:
alf.utils.averager.EMAveragerAverager with adaptive update_rate.
This averager gives higher weight to more recent samples for calculating the average. Roughly speaking, the weight for each sample at time \(t\) is roughly proportional to \((t/T)^{speed-1}\), where \(T\) is the current time step. See
notes/streaming_averaging_amd_sampling.pyfor detail.- Parameters
tensor_spec (nested TensorSpec) – the
TensorSpecfor the value to be averagedspeed (float) – speed of updating mean and variance.
name (str) – name of this averager
- training: bool#
- class EMAverager(tensor_spec, update_rate, name='EMAverager')[source]#
Bases:
torch.nn.modules.module.ModuleClass for exponential moving average. Suppose the update rate is \(\alpha\), and the quantity to be averaged is denoted as \(x\), then
\[x_t = (1-\alpha)x_{t-1} + \alpha x\]The average is corrected by a mass \(w_t\) as \(\frac{x_t}{w_t}\), and the mass is calculated as:
\[w_t = (1-\alpha) * w_{t-1} + \alpha\]Note that update rate can be a fixed floating number or a variable. If it is a variable, the update rate can be changed by the user.
- Parameters
tensor_spec (nested TensorSpec) – the
TensorSpecfor the value to be averagedupdate_rate (float|Variable) – the update rate
name (str) – name of this averager
- average(tensor)[source]#
Combines
self.updateandself.getin one step. Can be handy in practice.- Parameters
tensor (nested Tensor) – a value for updating the average; outer dims will be first averaged before being added to the average
- Returns
the current average
- Return type
Tensor
- training: bool#
- class ScalarAdaptiveAverager(speed=10, dtype=torch.float32, name='ScalarAdaptiveAverager')[source]#
Bases:
alf.utils.averager.AdaptiveAveragerAdaptiveAverager for scalar value.
- Parameters
speed (float) – speed of updating mean and variance.
dtype (torch.dtype) – dtype of the scalar
name (str) – name of this averager
- training: bool#
- class ScalarEMAverager(update_rate, dtype=torch.float32, name='ScalarEMAverager')[source]#
Bases:
alf.utils.averager.EMAveragerEMAverager for scalar value
- Parameters
udpate_rate (float|Variable) – update rate
dtype (torch.dtype) – dtype of the scalar
name (str) – name of this averager
- training: bool#
- class ScalarWindowAverager(window_size, dtype=torch.float32, name='ScalarWindowAverager')[source]#
Bases:
alf.utils.averager.WindowAveragerWindowAverager for scalar value
- Parameters
window_size (int) – the size of the window
dtype (torch.dtype) – dtype of the scalar
name (str) – name of this averager
- training: bool#
- class WindowAverager(tensor_spec, window_size, name='WindowAverager')[source]#
Bases:
torch.nn.modules.module.ModuleWindowAverager calculate the average of the past
window_sizesamples. :type tensor_spec:TensorSpec:param tensor_spec: theTensorSpecfor the value to beaveraged
- Parameters
window_size (int) – the size of the window
name (str) – name of this averager
- average(tensor)[source]#
Combines
self.updateandself.getin one step. Can be handy in practice.- Parameters
tensor (nested Tensor) – a value for updating the average; outer dims will be averaged first before being added
- Returns
the current average
- Return type
Tensor
- training: bool#
- average_outer_dims(tensor, spec)[source]#
- Parameters
tensor (Tensor) – a single Tensor
spec (TensorSpec) –
- Returns
the average tensor across outer dims
alf.utils.checkpoint_utils#
- class Checkpointer(ckpt_dir, **kwargs)[source]#
Bases:
objectA checkpoint manager for saving and loading checkpoints.
A class for saving checkpoints. It also saves a json file containing the structure of the model state checkpoint, which facilitates inspecting the structure of the checkpoint without having to load it first. This is useful for cases such as extracting a sub-dictionary from the whole.
Example usage:
alg_root = MyAlg(params=[p1, p2], sub_algs=[a1, a2], optimizer=opt) ckpt_mngr = ckpt_utils.Checkpointer(ckpt_dir, alg=alg_root)
- Parameters
ckpt_dir – The directory to save checkpoints. Create ckpt_dir if it doesn’t exist.
kwargs – Items to be included in the checkpoint. Each item needs to have state_dict and load_state_dict implemented. For instance of Algorithm, only the root need to be passed in, all the children modules and optimizers are automatically extracted and checkpointed. If a child module is also passed in, it will be treated as the root to be recursively processed.
- has_checkpoint(global_step='latest')[source]#
Whether there is a checkpoint in the checkpoint directory.
- Parameters
global_step (int|str) – If an int, return True if file “ckpt-{global_step}” is in the checkpoint directory. If “lastest”, return True if “latest” is in the checkpoint directory.
- load(global_step='latest', ignored_parameter_prefixes=[], including_optimizer=True, including_replay_buffer=True, including_data_transformers=True, strict=True)[source]#
Load checkpoint :param global_step: the number of training steps which is used to
specify the checkpoint to be loaded. If global_step is ‘latest’, the most recent checkpoint named ‘latest’ will be loaded.
- Parameters
ingored_parameter_prefixes (list[str]) – ignore the parameters whose name has one of these prefixes in the checkpoint.
including_optimizer (bool) – whether load optimizer checkpoint
including_replay_buffer (bool) – whether load replay buffer checkpoint.
including_data_transformers (bool) – whether load data transformer checkpoint.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dictmatch the keys returned by this module’storch.nn.Module.state_dictfunction. Ifstrict=True, will keep lists of missing and unexpected keys and raise error when any of the lists is non-empty; ifstrict=False, missing/unexpected keys will be omitted and no error will be raised. (Default:True)
- Returns
- the current step number for the loaded
checkpoint. current_step_num is set to - 1 if the specified checkpoint does not exist.
- Return type
current_step_num (int)
- save(global_step)[source]#
Save states of all modules to checkpoint
- Parameters
global_step (int) – the number of training steps corresponding to the current state to be saved. It will be appended to the name of the checkpoint as a suffix. This function will also save a copy of the latest checkpoint in a file named ‘latest’.
- enable_checkpoint(module, flag=True)[source]#
Enable/disable checkpoint for
module.- Parameters
module (torch.nn.Module) –
flag (bool) – True to enable checkpointing, False to disable.
- extract_sub_state_dict_from_checkpoint(checkpoint_prefix, checkpoint_path)[source]#
Extract a (sub-)state-dictionary from a checkpoint file. The state dictionary can be a sub-dictionary specified by the
checkpoint_prefix. :param checkpoint_prefix: the prefix to the sub-dictionary in thecheckpoint to be loaded. It can be a multi-step path denoted by “A.B.C” (e.g. “alg._sub_alg1”). If prefix is ‘’, the full dictionary from the checkpoint file will be returned.
- Parameters
checkpoint_path (str) – the full path to the checkpoint file saved by ALF, e.g. “/path_to_experiment/train/algorithm/ckpt-100”.
- is_checkpoint_enabled(module)[source]#
Whether
modulewill checkpointed.By default, a module used in
Algorithmwill be checkpointed. The checkpointing can be disabled by callingenable_checkpoint(module, False):param module: module in question :type module: torch.nn.Module- Returns
True if the parameters of this module will be checkpointed
- Return type
bool
alf.utils.common#
Various functions used by different alf modules.
- class Periodically(body, period, name='periodically')[source]#
Bases:
torch.nn.modules.module.ModulePeriodically performs the operation defined in body.
- Parameters
body (Callable) – callable to be performed every time an internal counter is divisible by the period.
period (int) – inverse frequency with which to perform the operation.
name (str) – name of the object.
- Raises
TypeError – if body is not a callable.
- forward()[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class TargetUpdater(models, target_models, tau=1.0, period=1, init_copy=True, delayed_update=False)[source]#
Bases:
torch.nn.modules.module.ModulePerforms a soft update of the target model parameters.
For each weight \(w_s\) in the model, and its corresponding weight \(w_t\) in the target_model, a soft update is:
\[w_t = (1 - \tau) * w_t + \tau * w_s.\]Note: we only perform soft updates for parameters and always copy buffers.
- Parameters
models (Network | list[Network] | Parameter | list[Parameter]) – the current model or parameter.
target_models (Network | list[Network] | Parameter | list[Parameter]) – the model or parameter to be updated.
tau (float) – A float scalar in \([0, 1]\). Default \(\tau=1.0\) means hard update.
period (int) – Step interval at which the target model is updated.
init_copy (bool) – If True, also copy
modelstotarget_modelsin the beginning.delayed_update (
bool) – if True,target_modelsis updated using recent_models everyperiodsteps. Iftauis 1, the recent_models ismodelsperiodsteps before. Iftauis not 1, recent_models is an exponential moving average ofmodelswith ratetau. The use of delayed_update may help to improve the stability of TD learning when a smallperiodis used.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward()[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- active_action_target_entropy(active_action_portion=0.2, min_entropy=0.3)[source]#
Automatically compute target entropy given the action spec. Currently support discrete actions only.
The general idea is that we assume \(Nk\) actions having uniform probs for a good policy. Thus the target entropy should be \(log(Nk)\), where \(N\) is the total number of discrete actions and k is the active action portion.
TODO: incorporate this function into
EntropyTargetAlgorithmif it proves to be effective.- Parameters
active_action_portion (float) – a number in \((0, 1]\). Ideally, this value should be greater than
1/num_actions. If it’s not, it will be ignored.min_entropy (float) – the minimum possible entropy. If the auto-computed entropy is smaller than this value, then it will be replaced.
- Returns
the target entropy for
EntropyTargetAlgorithm.- Return type
float
- add_method(cls)[source]#
A decorator for adding a method to a class (cls). Example usage:
class A: pass @add_method(A) def new_method(self): print('new method added') # now new_method() is added to class A and is ready to be used a = A() a.new_method()
- as_list(x)[source]#
Convert
xto a list.It performs the following conversion:
None => [] list => x tuple => list(x) other => [x]
- Parameters
x (any) – the object to be converted
- Returns
- Return type
list
- call_stack()[source]#
Return a list of strings showing the current function call stacks for debugging.
- Return type
List[str]
- cast_transformer(observation, dtype=torch.float32)[source]#
Cast observation
- Parameters
observation (nested Tensor) – observation
dtype (Dtype) – The destination type.
- Returns
casted observation
- check_numerics(nested)[source]#
Assert all the tensors in nested are finite.
- Parameters
nested (nested Tensor) – nested Tensor to be checked.
- compute_summary_or_eval_interval(config, summary_or_eval_calls=100)[source]#
Automatically compute a summary or eval interval according to the config and the expected total number of summary or eval calls. This function can avoid manually computing the interval value when an expected number of calls is in mind.
Warning
This function might not work for algorithms that change the global counter themselves, e.g.,
LMAlgorithm.- Parameters
config (TrainerConfig) – the configuration object for training
summary_or_eval_calls (int) – the expected number of summary or eval calls throughout the training process. This number can control the time consumed on summary or eval. Note that this number might not be exactly satisfied eventually, if the calculated interval has been rounded up.
- Returns
summary or eval interval
- Return type
int
- copy_gin_configs(root_dir, gin_files)[source]#
Copy gin config files to root_dir
- Parameters
root_dir (str) – directory path
gin_files (None|list[str]) – list of file paths
- create_ou_process(action_spec, ou_stddev, ou_damping)[source]#
Create nested zero-mean Ornstein-Uhlenbeck processes.
The temporal update equation is:
x_next = (1 - damping) * x + N(0, std_dev)
Note: if
action_specis nested, the returned nested OUProcess will not bec checkpointed.- Parameters
action_spec (nested BountedTensorSpec) – action spec
ou_damping (float) – Damping rate in the above equation. We must have \(0 <= damping <= 1\).
ou_stddev (float) – Standard deviation of the Gaussian component.
- Returns
nested
OUProcesswith the same structure asaction_spec.
- detach(nests)[source]#
Detach nested Tensors or Distributions
- Parameters
nests (
Any) – tensors or distributions to be detached- Returns
detached Tensors/Distributions with same structure as nests
- class eval_context[source]#
Bases:
objectA context manager that will automatically mark the
_exe_modeflag asEXE_MODE_EVALwhen entering a context and revert to the original_exe_modewhen exiting the context.
- expand_dims_as(x, y, end=True)[source]#
Expand the shape of
xwith extra singular dimensions.The result is broadcastable to the shape of
y.- Parameters
x (Tensor) – source tensor
y (Tensor) – target tensor. Only its shape will be used.
end (bool) – If True, the extra dimensions are at the end of
x; otherwise they are at the beginning.
- Returns
xwith extra singular dimensions.
- flattened_size(spec)[source]#
Return the size of the vector if spec.shape is flattened.
It’s same as np.prod(spec.shape) :param spec: a TensorSpec object :type spec: alf.TensorSpec
- Returns
the size of flattened shape
- Return type
np.int64
- generate_alf_snapshot(alf_root, conf_file, dest_path)[source]#
Given a destination path, copy the local ALF root dir to the path. To save disk space, only
*.pyfiles will be copied.This function can be used to generate a snapshot of the repo so that the exactly same code status will be recovered when later playing a trained model or launching a grid-search job in the waiting queue.
- Parameters
alf_root (
str) – the parent path of the ‘alf’ moduleconf_file (
str) – the alf config filedest_path (
str) – the path to generate a snapshot of ALF repo
- get_action_spec()[source]#
Get the specs of the tensors expected by
step(action)of the global environment.- Returns
a spec that describes the shape and dtype of each tensor expected by
step().- Return type
nested TensorSpec
- get_alf_snapshot_env_vars(root_dir)[source]#
Given a
root_dir, return modified env variable dict so thatPYTHONPATHpoints to the ALF snapshot under this directory.
- get_all_parameters(obj)[source]#
Get all the parameters under
objand its descendents.Note: This function assumes all the parameters can be reached through tuple, list, dict, set, nn.Module or the attributes of an object. If a parameter is held in a strange way, it will not be included by this function.
- Parameters
obj (object) – will look for paramters under this object.
- Returns
list of (path, Parameters)
- Return type
list
- get_conf_file(root_dir=None)[source]#
Get the configuration file.
If
FLAGS.confis not set, find alf_config.py or configured.gin underFLAGS.root_dirand returns it. If there is no ‘conf’ flag defined, return None.- Parameters
root_dir (str) – when None, FLAGS.root_dir is used to find the conf file.
- Returns
the name of the conf file. None if there is no conf file
- Return type
str
- get_gin_confg_strs()[source]#
Obtain both the operative and inoperative config strs from gin.
The operative configuration consists of all parameter values used by configurable functions that are actually called during execution of the current program, and inoperative configuration consists of all parameter configured but not used by configurable functions. See
gin.operative_config_str()andgin_utils.inoperative_config_strfor more detail on how the config is generated.- Returns
md_operative_config_str (str): a markdown-formatted operative str
md_inoperative_config_str (str): a markdown-formatted inoperative str
- Return type
tuple
- get_gin_file()[source]#
Get the gin configuration file.
If
FLAGS.gin_fileis not set, find gin files underFLAGS.root_dirand returns them. If there is no ‘gin_file’ flag defined, return ‘’.- Returns
the gin file(s)
- get_initial_policy_state(batch_size, policy_state_spec)[source]#
Return zero tensors as the initial policy states.
- Parameters
batch_size (int) – number of policy states created
policy_state_spec (nested structure) – each item is a tensor spec for a state
- Returns
- each item is a tensor with the first dim equal
to
batch_size. The remaining dims are consistent with the corresponding state spec ofpolicy_state_spec.
- Return type
state (nested structure)
- get_initial_time_step(env, first_env_id=0)[source]#
Return the initial time step.
- Parameters
env (AlfEnvironment) –
first_env_id (int) – the environment ID for the first sample in this batch.
- Returns
the init time step with actions as zero tensors.
- Return type
- get_observation_spec(field=None)[source]#
Get the spec of observation transformed by data transformers.
The data transformers are specified by
TrainerConfig.data_transformer_ctor.- Parameters
field (str) – a multi-step path denoted by “A.B.C”.
- Returns
a spec that describes the observation.
- Return type
nested TensorSpec
- get_raw_observation_spec(field=None)[source]#
Get the
TensorSpecof observations provided by the global environment.- Parameters
field (str) – a multi-step path denoted by “A.B.C”.
- Returns
a spec that describes the observation.
- Return type
nested TensorSpec
- get_reward_spec()[source]#
Get the specs of the reward tensors of the global environment. :returns: a spec that describes the shape and dtype of each reward
tensor.
- Return type
nested TensorSpec
- get_states_shape()[source]#
Get the tensor shape of internal states of the agent provided by the global environment.
- Returns
0 if internal states is not part of observation; otherwise a
torch.Size. We don’t raise error so this code can serve to check whetherenvhas states input.
- get_unused_port(start, end=65536, n=1)[source]#
Get an unused port in the range [start, end) .
- Parameters
start (int) – port range start
end (int) – port range end
n (int) – get
nconsecutive unused ports
- Raises
socket.error – if no unused port is available
- get_vocab_size()[source]#
Get the vocabulary size of observations provided by the global environment.
- Returns
size of the environment’s/teacher’s vocabulary. Returns 0 if language is not part of observation. We don’t raise error so this code can serve to check whether the env has language input
- Return type
int
- image_scale_transformer(observation, fields=None, min=- 1.0, max=1.0)[source]#
Scale image to min and max (0->min, 255->max).
- Parameters
observation (nested Tensor) – If observation is a nested structure, only
namedtupleanddictare supported for now.fields (list[str]) – the fields to be applied with the transformation. If None, then
observationmust be aTensorwith dtypeuint8. A field str can be a multi-step path denoted by “A.B.C”.min (float) – normalize minimum to this value
max (float) – normalize maximum to this value
- Returns
Transfromed observation
- info(msg, *args)[source]#
Generate info message
msg % args.- Parameters
msg – str, the message to be logged.
*args – The args to be substitued into the msg.
- info_once(msg, *args)[source]#
Generate info message
msg % argsonce.- Parameters
msg – str, the message to be logged.
*args – The args to be substitued into the msg.
- is_eval()[source]#
Return a bool value indicating whether the current code belongs to evaluation or playing a learned model.
- is_inside_docker_container()[source]#
Return whether the current process is running inside a docker container.
See discussions at https://stackoverflow.com/questions/23513045/how-to-check-if-a-process-is-running-inside-docker-container
- is_pretrain()[source]#
Return a bool value indicating whether the current code belongs to pre-train. The code within a function that is decorated by
mark_pretrainis flagged aspretrain. A code block that is within apretrain_contextis also flagged aspretrain.
- is_replay()[source]#
Return a bool value indicating whether the current code belongs to replaying. Replaying implies off-policy training.
Any code under
train_from_replay_buffer()of any algorithm is classified as replaying. This phase starts from experience sampling from the replay buffer, all the way to the parameter update.
- is_repo_root(dir, module_name)[source]#
Given a directory, check if it is a valid repo root. Currently the way of checking is to see if there is valid
__init__.pyunder it.
- is_rollout()[source]#
Return a bool value indicating whether the current code belongs to unrolling. For on-policy algorithms, unrolling could be treated as part of training as it usually generates training info for calculating the loss.
Any code under
unroll()of the root RL algorithm is classified as unrolling. This is the phase of collecting experiences for training.
- is_training(alg)[source]#
Return a bool value indicating whether the current code is in a training phase, for either an on-policy or an off-policy algorithm.
A training phase is defined as the rollout phase for an on-policy algorithm, or the replay phase for an off-policy algorithm.
Note
Currently this function returns False for the code under
train_from_unroll().- Parameters
alg (Algorithm) – the algorithm to be decided
- log_metrics(metrics, prefix='')[source]#
Log metrics through logging. :param metrics: list of metrics to be logged :type metrics: list[alf.metrics.StepMetric] :param prefix: prefix to the log segment :type prefix: str
- mark_eval(func)[source]#
A decorator that will automatically mark the
_exe_modeflag when entering/exiting a evaluation/test function.- Parameters
func (Callable) – a function
- mark_pretrain(func)[source]#
A decorator that will automatically mark the
_exe_modeflag when entering/exiting a pretrain function.- Parameters
func (Callable) – a function
- mark_replay(func)[source]#
A decorator that will automatically mark the
_exe_modeflag when entering/exiting a experience replay function.- Parameters
func (Callable) – a function
- mark_rollout(func)[source]#
A decorator that will automatically mark the
_exe_modeflag when entering/exiting a rollout function.- Parameters
func (Callable) – a function
- parse_conf_file(conf_file)[source]#
Parse config from file.
It also looks for FLAGS.gin_param and FLAGS.conf_param for extra configs.
Note: a global environment will be created (which can be obtained by alf.get_env()) and random seed will be initialized by this function using common.set_random_seed().
- Parameters
conf_file (str) – the full path to the config file
- class pretrain_context[source]#
Bases:
objectA context manager that will automatically mark the
_exe_modeflag asEXE_MODE_PRETRAINwhen entering a context and revert to the original_exe_modewhen exiting the context.
- read_conf_file(root_dir)[source]#
Read the content of the conf file.
- Parameters
root_dir (
str) – alf log directory path- Return type
str- Returns
the content of the conf file as a str.
Noneif conf file is not specified through commandline and cannot be found in root_dir
- class replay_context[source]#
Bases:
objectA context manager that will automatically mark the
_exe_modeflag asEXE_MODE_REPLAYwhen entering a context and revert to the original_exe_modewhen exiting the context.
- reset_state_if_necessary(state, initial_state, reset_mask)[source]#
Reset state to initial state according to
reset_mask.- Parameters
state (nested Tensor) – the current batched states
initial_state (nested Tensor) – batched intitial states
reset_mask (nested Tensor) – with
shape=(batch_size,), dtype=torch.bool
- Returns
nested Tensor
- class rollout_context[source]#
Bases:
objectA context manager that will automatically mark the
_exe_modeflag asEXE_MODE_ROLLOUTwhen entering a context and revert to the original_exe_modewhen exiting the context.
- run_under_record_context(func, summary_dir, summary_interval, flush_secs, summarize_first_interval=True, summary_max_queue=10)[source]#
Run
funcunder summary record context.- Parameters
func (Callable) – the function to be executed.
summary_dir (str) – directory to store summary. A directory starting with
~/will be expanded to$HOME/.summary_interval (int) – how often to generate summary based on the global counter
flush_secs (int) – flush summary to disk every so many seconds
summarize_first_interval (bool) – whether to summarize every step of the first interval (default True). It might be better to turn this off for an easier post-processing of the curve.
summary_max_queue (int) – the largest number of summaries to keep in a queue; will flush once the queue gets bigger than this. Defaults to 10.
- set_exe_mode(mode)[source]#
Mark whether the current code belongs to unrolling or training. This flag might be used to change the behavior of some functions accordingly.
- Parameters
training (bool) – True for training, False for unrolling
- Returns
the old exe mode
- set_random_seed(seed)[source]#
Set a seed for deterministic behaviors.
Note: If someone runs an experiment with a pre-selected manual seed, he can definitely reproduce the results with the same seed; however, if he runs the experiment with seed=None and re-run the experiments using the seed previously returned from this function (e.g. the returned seed might be logged to Tensorboard), and if cudnn is used in the code, then there is no guarantee that the results will be reproduced with the recovered seed.
- Parameters
seed (int|None) – seed to be used. If None, a default seed based on pid and time will be used.
- Returns
The seed being used if
seedis None.
- set_transformed_observation_spec(spec)[source]#
Set the spec of the observation transformed by data transformers.
- snapshot_repo_roots()[source]#
Return a dict of repo root dirs for snapshot. The paths should be defined by a special environment variable
ALF_SNAPSHOT_REPO_ROOTS, in the following format:export ALF_SNAPSHOT_REPO_ROOTS="<module_name1>=<repo_root1>:<module_name2>=<repo_root2>:..."
where pairs of “<module_name>=<repo_root>” are separated by “:”. Note that
<repo_root>should be the parent dir of the module package dir.- Returns
- a dict of
{module_name: repo_root}, excluding the alf repo itself.
- a dict of
- Return type
dict[str]
- summarize_gin_config()[source]#
Write the operative and inoperative gin config to Tensorboard summary.
- tuplify2d(x)[source]#
Convert
xto a tuple of length two.It performs the following conversion:
x => x if isinstance(x, tuple) and len(x) == 2 x => (x, x) if not isinstance(x, tuple)
- Parameters
x (any) – the object to be converted
- Returns
- Return type
tuple
- unzip_alf_snapshot(root_dir)[source]#
Restore an ALF snapshot from a job directory by unzipping the snapshot ‘tar.gz’ files.
- Parameters
root_dir (
str) – the tensorboard job directory
- warning(msg, *args)[source]#
Generate warning message
msg % args.- Parameters
msg – str, the message to be logged.
*args – The args to be substitued into the msg.
- warning_once(msg, *args)[source]#
Generate warning message
msg % argsonce.Note that the current implementation resembles that of the
log_every_n()`function inloggingbut reduces the calling stack by one to ensure the multiple warning once messages generated at difference places can be displayed correctly.- Parameters
msg – str, the message to be logged.
*args – The args to be substitued into the msg.
- write_config(root_dir)[source]#
Write config to a file under directory
root_dirConfigs from FLAGS.conf_param are also recorded.
- Parameters
root_dir (
str) – directory path
- write_gin_configs(root_dir, gin_file)[source]#
Write a gin configration to a file. Because the user can
manually change the gin confs after loading a conf file into the code, or
include a gin file in another gin file while only the latter might be copied to
root_dir.
So here we just dump the actual used gin conf string to a file.
- Parameters
root_dir (str) – directory path
gin_file (str) – a single file path for storing the gin configs. Only the basename of the path will be used.
alf.utils.conditional_ops#
Conditional operations.
- conditional_update(target, cond, func, *args, **kwargs)[source]#
Update target according to cond mask
Compute result as an update of
targetbased oncond. To be specific, result[row] isfunc(*args[row], **kwargs[row])if cond[row] is True, otherwise result[row] will be target[row]. Note thattargetwill not be changed.If you simply want to do some conditional computation without actually returning any results. You can use conditional_update in the following way:
# func needs to return an empty tuple () conditional_update((), cond, func, *args, **kwargs)
- select_from_mask(data, mask)[source]#
Select the items from data based on mask.
data[i,…] will be selected to form a new tensor if mask[i] is True or non-zero
- Parameters
data (nested Tensor) – source tensor
mask (Tensor) – 1D Tensor mask.shape[0] should be same as data.shape[0]
- Returns
nested Tensor with the same structure as data
alf.utils.data_buffer#
Classes for storing data for sampling.
- class DataBuffer(data_spec, capacity, device='cpu', name='DataBuffer')[source]#
Bases:
alf.utils.data_buffer.RingBufferA simple circular buffer supporting random sampling. This buffer doesn’t preserve temporality as data from multiple environments will be arbitrarily stored.
Not multiprocessing safe.
- Parameters
data_spec (nested TensorSpec) – spec for the data item (without batch dimension) to be stored.
capacity (int) – capacity of the buffer.
device (str) – which device to store the data
name (str) – name of the buffer
- add_batch(batch)[source]#
Add a batch of items to the buffer.
Add batch_size items along the length of the underlying RingBuffer, whereas RingBuffer.enqueue only adds data of length 1. Truncates the data if
batch_size > capacity.- Parameters
batch (Tensor) – of shape
[batch_size] + tensor_spec.shape
- property current_pos#
- property current_size#
- get_batch(batch_size)[source]#
Get batsh_size random samples in the buffer.
- Parameters
batch_size (int) – batch size
- Returns
Tensor of shape
[batch_size] + tensor_spec.shape
- get_batch_by_indices(indices)[source]#
Get the samples by indices
index=0 corresponds to the earliest added sample in the DataBuffer.
- Parameters
indices (Tensor) – indices of the samples
- Returns
Tensor of shape
[batch_size] + tensor_spec.shape, wherebatch_sizeisindices.shape[0]- Return type
Tensor
- training: bool#
- class RingBuffer(data_spec, num_environments, max_length=1024, device='cpu', allow_multiprocess=False, name='RingBuffer')[source]#
Bases:
torch.nn.modules.module.ModuleBatched Ring Buffer.
Multiprocessing safe, optionally via:
allow_multiprocessflag, blocking modes toenqueueanddequeue, a stop event to terminate blocked processes, and putting buffer into shared memory.This is the underlying implementation of
ReplayBufferandQueue.Different from
tf_agents.replay_buffers.tf_uniform_replay_buffer, this buffer allows users to specify the environment id when adding batch. Thus, multiple actors can store experience in the same buffer.Once stop event is set, all blocking
enqueueanddequeuecalls that happen afterwards will be skipped, unless the operation already started.Terminology: we use
posas in_current_posto refer to the always increasing position of an element in the infinitly long buffer, andidxas the actual index of the element in the underlying store (_buffer). That meansidx == pos % _max_lengthis always true, and one should use_buffer[idx]to retrieve the stored data.- Parameters
data_spec (nested TensorSpec) – spec describing a single item that can be stored in this buffer.
num_environments (int) – number of environments or total batch size.
max_length (int) – The maximum number of items that can be stored for a single environment.
device (str) – A torch device to place the Variables and ops.
allow_multiprocess (bool) – if
True, allows multiple processes to write and read the buffer asynchronously.name (str) – name of the replay buffer.
- clear(env_ids=None)[source]#
Clear the buffer.
- Parameters
env_ids (Tensor) – optional list of environment ids to clear
- dequeue(env_ids=None, n=1, blocking=False)[source]#
Return earliest
nsteps and mark them removed in the buffer.- Parameters
env_ids (Tensor) – If None,
batch_sizemust be num_environments. If not None, dequeue from these environments. We assume there is no duplicate ids inenv_id.result[i]will be from environmentenv_ids[i].n (int) – Number of steps to dequeue.
blocking (bool) – If
True, blocks if there is not enough data to dequeue.
- Returns
nested Tensors or None when blocking dequeue gets terminated by stop event. The shape of the Tensors is
[batch_size, n, ...].- Raises
AssertionError – when not enough data is present, in non-blocking
mode. –
- property device#
The device where the data is stored in.
- enqueue(batch, env_ids=None, blocking=False)[source]#
Add a batch of items to the buffer.
Note, when
blocking == False, it always succeeds, overwriting oldest data if there is no free slot.- Parameters
batch (Tensor) – of shape
[batch_size] + tensor_spec.shapeenv_ids (Tensor) – If
None,batch_sizemust benum_environments. If notNone, its shape should be[batch_size]. We assume there are no duplicate ids inenv_id.batch[i]is generated by environmentenv_ids[i].blocking (bool) – If
True, blocks if there is no free slot to add data. IfFalse, enqueue can overwrite oldest data.
- Returns
True on success, False only in blocking mode when queue is stopped.
- get_current_position()[source]#
Get the current position for each environment.
- Returns
with shape [num_environments].
- Return type
Tensor
- get_earliest_position(env_ids)[source]#
The earliest position that is still in the replay buffer.
- Parameters
env_ids (Tensor) – int64 Tensor of environment ids
- Returns
Tensor with the same shape as
env_ids, whose each entry is the earliest position that is still in the replay buffer for corresponding environment.
- has_data(env_ids, n=1)[source]#
Check
nsteps of data available forenv_ids.- Parameters
env_ids (Tensor) – Assumed not
None, properly checked bycheck_convert_env_ids().n (int) – Number of time steps to check.
- Returns
bool
- has_space(env_ids)[source]#
Check free space for one batch of data for env_ids.
- Parameters
env_ids (Tensor) – Assumed not
None, properly checked bycheck_convert_env_ids().- Returns
bool
- property num_environments#
- remove_up_to(n, env_ids=None)[source]#
Mark as removed earliest up to
nsteps.- Parameters
n (int) – max number of steps to mark removed from buffer.
- revive()[source]#
Clears the stop Event so blocking mode will start working again.
Only checked in blocking mode of dequeue and enqueue.
- stop()[source]#
Stop waiting processes from being blocked.
Only checked in blocking mode of dequeue and enqueue.
All blocking enqueue and dequeue calls that happen afterwards will be skipped (return
Nonefor dequeue orFalsefor enqueue), unless the operation already started.
- training: bool#
alf.utils.datagen#
Utilities for supervised learning algorithms
- class TestDataSet(input_dim=3, output_dim=1, size=1000, weight=None)[source]#
Bases:
Generic[torch.utils.data.dataset.T_co]
- get_classes(target, labels)[source]#
- Helper function to subclass a dataloader, i.e. select only given
classes from target dataset.
- Parameters
target (torch.utils.data.Dataset) – the dataset that should be filtered.
labels (list[int]) – list of labels to filter on.
- Returns
- indices of examples with label in
labels.
- Return type
label_indices (list[int])
- load_cifar10(label_idx=None, train_bs=100, test_bs=100, num_workers=0)[source]#
Loads the CIFAR-10 dataset. :param label_idx: classes to be loaded from the dataset. :type label_idx: list[int] :param train_bs: training batch size. :type train_bs: int :param test_bs: testing batch size. :type test_bs: int :param num_workers: number of processes to allocate for loading data. :type num_workers: int
- Returns
training data loader. test_loader (torch.utils.data.DataLoader): test data loader.
- Return type
train_loader (torch.utils.data.DataLoader)
- load_mnist(label_idx=None, train_bs=100, test_bs=100, num_workers=0)[source]#
Loads the MNIST dataset.
- Parameters
label_idx (list[int]) – class indices to load from the dataset.
train_bs (int) – training batch size.
test_bs (int) – testing batch size.
num_workers (int) – number of processes to allocate for loading data.
small_subset (bool) – load a small subset of 50 images for testing.
- Returns
training data loader. test_loader (torch.utils.data.DataLoader): test data loader.
- Return type
train_loader (torch.utils.data.DataLoader)
- load_wikitext103(train_bs, test_bs, max_vocab_size=32768)[source]#
Load WikiText103 data.
Note that all return Tensor are always in cpu.
- Parameters
train_bs (int) – training batch size
test_bs (int) – validation/test batch size
max_vocab_size (int) – maximal vocabulary size.
- Returns
torch.Tensor: train_data, int64 Tensor of shape [?, tran_bs]
torch.Tensor: val_data, int64 Tensor of shape [?, test_bs]
torch.Tensor: test_data, int64 Tensor of shape [?, test_bs]
torchtext.vocab.Vacob: vocab
- Return type
tuple
- load_wikitext2(train_bs, test_bs)[source]#
Load WikiText2 data.
Note that all return Tensor are always in cpu.
- Parameters
train_bs (int) – training batch size
test_bs (int) – validation/test batch size
- Returns
torch.Tensor: train_data, int64 Tensor of shape [?, tran_bs]
torch.Tensor: val_data, int64 Tensor of shape [?, test_bs]
torch.Tensor: test_data, int64 Tensor of shape [?, test_bs]
torchtext.vocab.Vacob: vocab
- Return type
tuple
alf.utils.dist_utils#
- AbsTransform#
alias of
alf.utils.dist_utils.get_invertible.<locals>.NewCls
- class AffineTransform(loc, scale, event_dim=0, *, cache_size=1)[source]#
Bases:
alf.utils.dist_utils.get_invertible.<locals>.NewClsOverwrite PyTorch’s
AffineTransformto provide a builder to be compatible withDistributionSpec.build_distribution().
- class AffineTransformedDistribution(base_dist, loc, scale)[source]#
Bases:
torch.distributions.transformed_distribution.TransformedDistributionTransform via the pointwise affine mapping \(y = \text{loc} + \text{scale} \times x\).
The reason of not using
td.TransformedDistributionis that we can implemententropy,mean,varianceandstddevforAffineTransforma.- Parameters
loc (Tensor or float) – Location parameter.
scale (Tensor or float) – Scale parameter.
- entropy()[source]#
Returns entropy of distribution, batched over batch_shape.
- Returns
Tensor of shape batch_shape.
- property mean#
Returns the mean of the distribution.
- property stddev#
Returns the variance of the distribution.
- property variance#
Returns the variance of the distribution.
- class Beta(concentration1, concentration0, eps=None, validate_args=None)[source]#
Bases:
torch.distributions.beta.BetaBeta distribution parameterized by
concentration1andconcentration0.Note: we need to wrap
td.Betaso thatself.concentration1andself.concentration0are the actual tensors passed in to construct the distribution. This is important in certain situation. For example, if you want to register a hook to process the gradient toconcentration1andconcentration0,td.Beta.concentration0.register_hook()will not work because gradient will not be backpropped totd.Beta.concentration0since it is sliced fromtd.Dirichlet.concentrationand gradient will only be backpropped totd.Dirichlet.concentrationinstead oftd.Beta.concentration0ortd.Beta.concentration1.- Parameters
concentration1 (float or Tensor) – 1st concentration parameter of the distribution (often referred to as alpha)
concentration0 (float or Tensor) – 2nd concentration parameter of the distribution (often referred to as beta)
eps (float) – a very small value indicating the interval
[eps, 1-eps]into which the sampled values will be clipped. This clipping can preventNaNandInfvalues in the gradients. If None, a small value defined by PyTorch will be used.
- property concentration0#
- property concentration1#
- property mode#
- rsample(sample_shape=())[source]#
We override the original
rsample()in order to clamp the output to avoid NaN and Inf values in the gradients. See Pyro’srsample()implementation in https://docs.pyro.ai/en/dev/_modules/pyro/distributions/affine_beta.html#AffineBeta.
- class DiagMultivariateBeta(concentration1, concentration0)[source]#
Bases:
torch.distributions.independent.IndependentCreate multivariate independent beta distribution.
- Parameters
concentration1 (float or Tensor) – 1st concentration parameter of the distribution (often referred to as alpha)
concentration0 (float or Tensor) – 2nd concentration parameter of the distribution (often referred to as beta)
- class DiagMultivariateCauchy(loc, scale)[source]#
Bases:
torch.distributions.independent.IndependentCreate multivariate cauchy distribution with diagonal scale matrix.
- Parameters
loc (Tensor) – median of the distribution. Note that Cauchy doesn’t have a mean (divergent).
scale (Tensor) – also known as “half width”. Should have the same shape as
loc.
- property loc#
- property scale#
- class DiagMultivariateNormal(loc, scale)[source]#
Bases:
torch.distributions.independent.IndependentCreate multivariate normal distribution with diagonal variance.
- Parameters
loc (Tensor) – mean of the distribution
scale (Tensor) – standard deviation. Should have same shape as
loc.
- property stddev#
Returns the standard deviation of the distribution.
- class DistributionSpec(builder, input_params_spec)[source]#
Bases:
object- Parameters
builder (Callable) – the function which is used to build the distribution. The returned value of
builder(input_params)is aDistributionwith input parameter asinput_params.input_params_spec (nested TensorSpec) – the spec for the argument of
builder.
- build_distribution(input_params)[source]#
Build a Distribution using
input_params.- Parameters
input_params (nested Tensor) – the parameters for build the distribution. It should match
input_params_specprovided as__init__.- Returns
- Return type
Distribution
- classmethod from_distribution(dist, from_dim=0)[source]#
Create a
DistributionSpecfrom aDistribution. :param dist: theDistributionfrom which the spec isextracted.
- Parameters
from_dim (int) – only use the dimenions from this. The reason of using
from_dim>0is that[0, from_dim)might be batch dimension in some scenario.- Returns
- Return type
- ExpTransform#
alias of
alf.utils.dist_utils.get_invertible.<locals>.NewCls
- class OUProcess(initial_value, damping=0.15, stddev=0.2)[source]#
Bases:
torch.nn.modules.module.ModuleA zero-mean Ornstein-Uhlenbeck process for generating noises.
The Ornstein-Uhlenbeck process is a process that generates temporally correlated noise via a random walk with damping. This process describes the velocity of a particle undergoing brownian motion in the presence of friction. This can be useful for exploration in continuous action environments with momentum.
The temporal update equation is:
x_next = (1 - damping) * x + N(0, std_dev)
- Parameters
initial_value (Tensor) – Initial value of the process.
damping (float) – The rate at which the noise trajectory is damped towards the mean. We must have \(0 <= damping <= 1\), where a value of 0 gives an undamped random walk and a value of 1 gives uncorrelated Gaussian noise. Hence in most applications a small non-zero value is appropriate.
stddev (float) – Standard deviation of the Gaussian component.
- forward()[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class OneHotCategoricalGumbelSoftmax(hard_sample=True, tau=1.0, **kwargs)[source]#
Bases:
torch.distributions.one_hot_categorical.OneHotCategoricalCreate a reparameterizable
td.OneHotCategoricaldistribution based on the Gumbel-softmax gradient estimator fromJang et al., "CATEGORICAL REPARAMETERIZATION WITH GUMBEL-SOFTMAX", 2017.
- Parameters
hard_sample (
bool) – If False, the rsampled result will be a “soft” vector of Gumbel softmax distribution, which naturally supports gradient backprop. If True,argmaxwill be applied on top of it and then a straight-through gradient estimator is used.tau (
float) – the Gumbel-softmax temperature forrsample. A higher temperature leads to a more uniform sample.
- has_rsample = True#
- property mode#
- class OneHotCategoricalStraightThrough(probs=None, logits=None, validate_args=None)[source]#
Bases:
torch.distributions.one_hot_categorical.OneHotCategoricalStraightThroughProvide an additional property
modewith gradient enabled.- property mode#
- PowerTransform#
alias of
alf.utils.dist_utils.get_invertible.<locals>.NewCls
- SigmoidTransform#
alias of
alf.utils.dist_utils.get_invertible.<locals>.NewCls
- class Softclip(low, high, hinge_softness=1.0, cache_size=1)[source]#
Bases:
torch.distributions.transforms.TransformTransform via the mapping defined in
alf.math_ops.softclip(). UnlikeSoftclipTF, this transform is symmetric regarding the lower and upper bound when squashing.- Parameters
low (float) – the lower bound
high (float) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from
lowtohigh.
- bijective = True#
- codomain: torch.distributions.constraints.Constraint = Real()#
- domain: torch.distributions.constraints.Constraint = Real()#
- sign = 1#
- SoftclipTF(low, high, hinge_softness=1.0)[source]#
Create a Softclip transform by composing Softlower, Softupper, and Affine transforms, adapted from tensorflow. Mathematically,
clipped = softupper(softlower(x, low), high) softclip(x) = (clipped - high) / (high - softupper(low, high)) * (high - low) + high
The second scaling step is beause we will have
softupper(low, high) < lowdue to distortion of softplus, so we need to shrink the interval slightly by(high - low) / (high - softupper(low, high))to preserve the lower bound. Due to this rescaling, the bijector can be mildly asymmetric.- Parameters
low (float|Tensor) – the lower bound
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from
lowtohigh.
- Softlower(low, hinge_softness=1.0)[source]#
Create a Softlower transform by composing the Softplus and Affine transforms. Mathematically,
softlower(x, low) = softplus(x - low) + low.- Parameters
low (float|Tensor) – the lower bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from
lowto identity.
- SoftmaxTransform#
alias of
alf.utils.dist_utils.get_invertible.<locals>.NewCls
- class Softplus(hinge_softness=1.0, cache_size=1)[source]#
Bases:
torch.distributions.transforms.TransformTransform via the mapping \(\text{Softplus}(x) = \log(1 + \exp(x))\).
Code adapted from pyro and tensorflow.
- Parameters
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from 0 to identity.
- bijective = True#
- codomain: torch.distributions.constraints.Constraint = GreaterThan(lower_bound=0.0)#
- domain: torch.distributions.constraints.Constraint = Real()#
- log_abs_det_jacobian(x, y)[source]#
Computes the log det jacobian log |dy/dx| given input and output.
- sign = 1#
- class Softsign(cache_size=1)[source]#
Bases:
torch.distributions.transforms.Transform- bijective = True#
- codomain: torch.distributions.constraints.Constraint = Interval(lower_bound=-1.0, upper_bound=1.0)#
- domain: torch.distributions.constraints.Constraint = Real()#
- log_abs_det_jacobian(x, y)[source]#
- \[\begin{split}\begin{array}{lll} y = \frac{x}{1+x} \rightarrow \frac{dy}{dx} = \frac{1}{(1+x)^2}, &\text{if} &x > 0\\ y = \frac{x}{1-x} \rightarrow \frac{dy}{dx} = \frac{1}{(1-x)^2}, &\text{else}&\\ \end{array}\end{split}\]
- sign = 1#
- Softupper(high, hinge_softness=1.0)[source]#
Create a Softupper transform by composing the Softplus and Affine transforms. Mathematically,
softupper(x, high) = -softplus(high - x) + high.- Parameters
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from identity to
high.
- class StableCauchy(loc, scale, validate_args=None)[source]#
Bases:
torch.distributions.cauchy.Cauchy- rsample(sample_shape=torch.Size([]), clipping_value=0.49)[source]#
Overwrite Pytorch’s Cauchy rsample for a more stable result. Basically the sampled number is clipped to fall within a reasonable range.
For reference:
> np.tan(math.pi * -0.499) -318.30883898554157 > np.tan(math.pi * -0.49) -31.820515953773853
- Parameters
clipping_value (float) – suppose eps is sampled from
(-0.5,0.5). It will be clipped to[-clipping_value, clipping_value]to avoid values with huge magnitudes.
- class StableTanh(cache_size=1)[source]#
Bases:
torch.distributions.transforms.TransformInvertible transformation (bijector) that computes \(Y = tanh(X)\), therefore \(Y \in (-1, 1)\).
This can be achieved by an affine transform of the Sigmoid transformation, i.e., it is equivalent to applying a list of transformations sequentially:
transforms = [AffineTransform(loc=0, scale=2) SigmoidTransform(), AffineTransform( loc=-1, scale=2]
However, using the
StableTanhtransformation directly is more numerically stable.- bijective = True#
- codomain: torch.distributions.constraints.Constraint = Interval(lower_bound=-1.0, upper_bound=1.0)#
- domain: torch.distributions.constraints.Constraint = Real()#
- log_abs_det_jacobian(x, y)[source]#
Computes the log det jacobian log |dy/dx| given input and output.
- sign = 1#
- calc_default_max_entropy(spec, fraction=0.8)[source]#
Calc default max entropy. :param spec: action spec :type spec: TensorSpec :param fraction: this fraction of the theoretical entropy upper bound
will be used as the max entropy
- Returns
A default max entropy for adjusting the entropy weight
- calc_default_target_entropy(spec, min_prob=0.1)[source]#
Calculate default target entropy.
- Parameters
spec (TensorSpec) – action spec
min_prob (float) – If continuous spec, we suppose the prob concentrates on a delta of
min_prob * (M-m); if discrete spec, we uniformly distributemin_probon all entries except the peak which has a probability of1 - min_prob.
- Returns
target entropy
- calc_default_target_entropy_quantized(spec, num_bins, ent_per_action_dim=- 1.0)[source]#
Calc default target entropy for quantized continuous action. :param spec: action spec :type spec: TensorSpec :param num_bins: number of quantization bins used to represent the
continuous action
- Parameters
ent_per_action_dim (int) – desired entropy per action dimension for the non-quantized continuous action; default value is -1.0 as suggested by the SAC paper.
- Returns
target entropy for quantized representation
- calc_uniform_log_prob(spec)[source]#
Given an action spec, calculate the uniform log prob.
- Parameters
spec (BoundedTensorSpec) – action spec must be a bounded spec
- Returns
The uniform log probability
- compute_entropy(distributions)[source]#
Computes total entropy of nested distribution. :param distributions: A possibly batched tuple of
distributions.
- Returns
entropy
- compute_log_probability(distributions, actions)[source]#
Computes log probability of actions given distribution.
- Parameters
distributions – A possibly batched tuple of distributions.
actions – A possibly batched action tuple.
- Returns
the log probability summed over actions in the batch.
- Return type
Tensor
- distributions_to_params(nests)[source]#
Convert distributions to its parameters, and keep tensors unchanged. Only returns parameters that have
Tensorvalues.- Parameters
nests (nested Distribution and Tensor) – Each
Distributionwill be converted to dictionary of itsTensorparameters.- Returns
Each leaf is a
Tensoror adictcorresponding to one distribution, with keys as parameter name and values as tensors containing parameter values.- Return type
nested Tensor/Distribution
- entropy_with_fallback(distributions, return_sum=True)[source]#
Computes total entropy of nested distribution. If
entropy()of a distribution is not implemented, this function will fallback to use sampling to calculate the entropy. It returns two values:(entropy, entropy_for_gradient).There are two situations:
entropy()is implemented and it’s same asentropy_for_gradient.entropy()is not implemented. We use sampling to calculate entropy. The unbiased estimator for entropy is \(-\log(p(x))\). However, the gradient of \(-\log(p(x))\) is not an unbiased estimator of the gradient of entropy. So we also calculate a value whose gradient is an unbiased estimator of the gradient of entropy. Seeestimated_entropy()for detail.
Examples:
ent, ent_for_grad = entropy_with_fall_back(dist, action_spec) alf.summary.scalar("entropy", ent) ent_for_grad.backward()
- Parameters
distributions (nested Distribution) – A possibly batched tuple of distributions.
return_sum (bool) – if True, return the total entropy. If not True, return the entropy for each distribution in the nest.
- Returns
entropy
entropy_for_gradient: You should use
entropyin situations where its value is needed, andentropy_for_gradientwhere you need to calculate the gradient of entropy.
- Return type
tuple
- epsilon_greedy_sample(nested_distributions, eps=0.1)[source]#
Generate greedy sample that maximizes the probability.
- Parameters
nested_distributions (nested Distribution) – distribution to sample from
eps (float) – a floating value in \([0,1]\), representing the chance of action sampling instead of taking argmax. This can help prevent a dead loop in some deterministic environment like Breakout.
- Returns
- Return type
(nested) Tensor
- estimated_entropy(dist, num_samples=1, check_numerics=False)[source]#
Estimate entropy by sampling.
Use sampling to calculate entropy. The unbiased estimator for entropy is \(-\log(p(x))\) where \(x\) is an unbiased sample of \(p\). However, the gradient of \(-\log(p(x))\) is not an unbiased estimator of the gradient of entropy. So we also calculate a value whose gradient is an unbiased estimator of the gradient of entropy. See
notes/subtleties_of_estimating_entropy.pyfor detail.- Parameters
dist (torch.distributions.Distribution) – concerned distribution
num_samples (int) – number of random samples used for estimating entropy.
check_numerics (bool) – If true, find NaN / Inf values. For debugging only.
- Returns
entropy
entropy_for_gradient: for calculating gradient.
- Return type
tuple
- extract_distribution_parameters(dist)[source]#
Extract the input parameters of a distribution.
- Parameters
dist (Distribution) – distribution from which to extract parameters
- Returns
the nest of the input parameter of the distribution
- extract_spec(nests, from_dim=1)[source]#
Extract
TensorSpecorDistributionSpecfor each element of a nested structure. It assumes that the first dimension of each element is the batch size.- Parameters
nests (nested structure) – each leaf node of the nested structure is a Tensor or Distribution of the same batch size.
from_dim (int) – ignore dimension before this when constructing the spec.
- Returns
each leaf node of the returned nested spec is the corresponding spec (excluding batch size) of the element of
nest.- Return type
nest
- get_base_dist(dist)[source]#
Get the base distribution.
- Parameters
dist (td.Distribution) –
- Returns
- The base distribution if dist is
td.Independentor td.TransformedDistribution, anddistif it istd.Normal.
- The base distribution if dist is
- Raises
NotImplementedError – if
distor its based distribution is nottd.Normal,td.Independentortd.TransformedDistribution.
- get_invertible(cls)[source]#
A helper function to turn on the cache mechanism for transformation. This is useful as some transformations (say \(g\)) may not be able to provide an accurate inversion therefore the difference between \(x\) and \(g^{-1}(g(x))\) is large. This could lead to unstable training in practice. For a torch transformation \(y=g(x)\), when
cache_sizeis set to one, the latest value for \((x, y)\) is cached and will be used later for future computations. E.g. for inversion, a call to \(g^{-1}(y)\) will return \(x\), solving the inversion error issue mentioned above. Note that in the case of having a chain of transformations (\(G\)), all the element transformations need to turn on the cache to ensure the composite transformation \(G\) satisfy: \(x=G^{-1}(G(x))\).
- get_mode(dist)[source]#
Get the mode of the distribution. Note that if
distis a transformed distribution, the result may not be the actual mode ofdist.- Parameters
dist (td.Distribution) –
- Returns
The mode of the distribution. If
distis a transformed distribution, the result is calculated by transforming the mode of its base distribution and may not be the actual mode fordist.- Raises
NotImplementedError – if dist or its base distribution is not
td.Categorical,td.Normal,td.Independentortd.TransformedDistribution.
- get_rmode(dist)[source]#
Get the mode of the distribution that support backpropogation. Note that if
distis a transformed distribution, the result may not be the actual mode ofdist.- Parameters
dist (td.Distribution) –
- Returns
The mode of the distribution. If
distis a transformed distribution, the result is calculated by transforming the mode of its base distribution and may not be the actual mode fordist.- Raises
NotImplementedError – if dist or its base distribution is not
td.Normal,StableCauchy,Beta,TruncatedDistribution,td.Independentortd.TransformedDistribution.
- params_to_distributions(nests, nest_spec)[source]#
Convert distribution parameters to
Distribution, keep tensors unchanged. :param nests: a nestedTensorand dictionary of tensorparameters of
Distribution. Typically,nestis obtained usingdistributions_to_params().- Parameters
nest_spec (nested DistributionSpec and TensorSpec) – The distribution params will be converted to
Distributionaccording to the correspondingDistributionSpecinnest_spec.- Returns
- Return type
nested Distribution or Tensor
- rsample_action_distribution(nested_distributions, return_log_prob=False)[source]#
Sample actions from distributions with reparameterization-based sampling.
It uses
Distribution.rsample()to do the sampling to enable backpropagation.- Parameters
nested_distributions (nested Distribution) – action distributions.
return_log_prob (bool) – whether to compute and return the log probability of the sampled actions, in addition to the sampled actions. In some cases, it is useful to compute the log probability immediately after the actions are sampled, as some subsequent operations might makes the cache mechanism (if turned on) invalid. Some example scenarios include 1) additional sampling operation applied on
nested_distributions, 2) some operations applied to the actions sampled fromnested_distributions(e.g., cloning). This which could cause numerical issues if we want to compute the log probability for actions sampled at an early stage, especially for actions that are close to action bounds. For more details on PyTorch Transform, its cache mechanism, and its impacts on RL algorithms, please check https://alf.readthedocs.io/en/latest/notes/pytorch_notes.html#transform-bijector.
- Returns
rsampled actions if return_log_prob is False
rsampled actions and log_prob if return_log_prob is True
- sample_action_distribution(nested_distributions, return_log_prob=False)[source]#
- Sample actions from distributions with conventional sampling without
enabling backpropagation.
- Parameters
nested_distributions (nested Distribution) – action distributions.
return_log_prob (bool) –
whether to compute and return the log probability of the sampled actions, in addition to the sampled actions. In some cases, it is useful to compute the log probability immediately after the actions are sampled, as some subsequent operations might makes the cache mechanism (if turned on) invalid. Some example scenarios include 1) additional sampling operation applied on
nested_distributions, 2) some operations applied to the actions sampled fromnested_distributions(e.g., cloning). This which could cause numerical issues if we want to compute the log probability for actions sampled at an early stage, especially for actions that are close to action bounds. For more details on PyTorch Transform, its cache mechanism, and its impacts on RL algorithms, please check https://alf.readthedocs.io/en/latest/notes/pytorch_notes.html#transform-bijector.
- Returns
sampled actions if return_log_prob is False
sampled actions and log_prob if return_log_prob is True
- to_distribution_param_spec(nests)[source]#
Convert the
DistributionSpecsin nests to their parameter specs.- Parameters
nests (nested DistributionSpec of TensorSpec) – Each
DistributionSpecwill be converted to a dictionary of the spec of its inputTensorparameters.- Returns
Each leaf is a
TensorSpecor adictcorresponding to one distribution, with keys as parameter name and values asTensorSpecsfor the parameters.- Return type
nested TensorSpec
alf.utils.distributed#
- data_distributed(method)[source]#
This decorator makes a target method of a module capable of being data distributed via DDP.
This is to provide a simple and transparent way to enable DDP for specific code logics.
When the method is wrapped by @data_distributed, the outputs (tensors) of this method will have gradient synchronization hooks attached to them. Later when those outputs are used in
backward()to compute gradients, the hooks will be called to synchronize across all processes. As a result, the corresponding parameters receive not only the gradients from this process, but also gradients from the other processes. Note that each single process will be TRAPPED at the call to thebackward()that involves those output tensors, until all processes finished the back propagation and have the gradients sync’ed.Example usage:
class A(nn.Module): # ... @data_distributed def compute_something(self, input): return self._network1(input), self._network2(input) # ...
In the above code, after applying the decorator, the method
compute_somethingwill be made data distributed if the following conditions are met:Multiple processes within the same process group creates A’s instances and calls
compute_something()individually.All such A instances have
self._ddp_activated_rankset to the correct rank of the GPU device that belongs to them.
Otherwise the method
compute_something()will behave normally.
- data_distributed_when(cond=None)[source]#
This is @ data_distributed with an extra conditionon.
The condition is a function that returns True or False given the wrapped module as the input. If the condition evaluates to False, DDP will not be activated and the original method will be called.
- make_ddp_performer(module, method, ddp_rank, find_unused_parameters=False)[source]#
Creates a DDP wrapped MethodPerformer.
This function is an alf.configurable and used in the @data_distributed series of decorators below. Override this in your configuration with
alf.config(‘make_ddp_performer’, find_unused_parameters=True)
to enable
find_unused_parameters. This asks DDP to ignore parameters that are not used for computing the output offorward()when waiting for synchronization of gradients and parameters uponbackward(). Normally you do not need to worry about this. It is useful for algorithms such as PPG where part of the parameters of the model does NOT ALWAYS contribute to the network output.
alf.utils.distributions#
- class CauchyITS[source]#
Bases:
alf.utils.distributions.InverseTransformSamplingCauchy distribution.
\[p(x) = 1 / (pi * (1 + x*x))\]
- class InverseTransformSampling[source]#
Bases:
objectInterface for defining inverse transform sampling.
- class NormalITS[source]#
Bases:
alf.utils.distributions.InverseTransformSamplingNormal distribution.
\[p(x) = 1/sqrt(2*pi) * exp(-x^2/2)\]
- class T2Cdf_[source]#
Bases:
torch.autograd.function.Function- static backward(ctx, grad_output)[source]#
Defines a formula for differentiating the operation.
This function is to be overridden by all subclasses.
It must accept a context
ctxas the first argument, followed by as many outputs didforward()return, and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,backward()will havectx.needs_input_grad[0] = Trueif the first input toforward()needs gradient computated w.r.t. the output.
- static forward(ctx, x)[source]#
Performs the operation.
This function is to be overridden by all subclasses.
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
The context can be used to store tensors that can be then retrieved during the backward pass.
- class T2ITS[source]#
Bases:
alf.utils.distributions.InverseTransformSamplingStudent’s t-distribution with DOF 2.
\[p(x) = 1 / (2 * (1 + x*x) ** 1.5)\]
- class TruncatedCauchy(loc, scale, lower_bound, upper_bound)[source]#
Bases:
alf.utils.distributions.TruncatedDistributionTruncated Cauchy distribution.
The truncated normal distribution \(q(x)\) is defined by 4 parameters: location \(\mu\), scale parameters \(s\), lower bound \(l\) and upper bound \(u\).
\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s})\]where \(p\) and \(P\) are the pdf and cdf of the standard Cauchy distribution respectively.
- Parameters
loc – the location parameter
scale – the scale parameter
lower_bound – the lower bound
upper_bound – the upper bound
its – the standard distribution to be used.
- class TruncatedDistribution(loc, scale, lower_bound, upper_bound, its)[source]#
Bases:
torch.distributions.distribution.DistributionThe base class of truncated distributions.
A truncated distribution \(q(x)\) is defined as a standard base distribution \(p(x)\) and location \(\mu\), scale parameters \(s\), lower bound \(l\) and upper bound \(u\)
\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s}) if l \le x le u q(x) = 0 otherwise\]where \(P\) is the cdf of \(p\).
- Parameters
loc (
Tensor) – the location parameter. Its shape is batch_shape + event_shape.scale (
Tensor) – the scale parameter. Its shape is batch_shape + event_shape.lower_bound (
Tensor) – the lower bound. Its shape is event_shape.upper_bound (
Tensor) – the upper bound. Its shape is event_shape.its (
InverseTransformSampling) – the standard distribution to be used.
- arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}#
- has_rsample = True#
- property loc#
Location parameter of this distribution.
- log_prob(value)[source]#
The log of the probability density evaluated at
value.- Parameters
value (
Tensor) – its shape should besample_shape + batch_shape + event_shape- Returns
Tensor of shape
sample_shape + batch_shape
- property lower_bound#
Lower bound of this distribution.
- property mode#
Mode of this distribution.
- rsample(sample_shape=torch.Size([]))[source]#
Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched.
- Parameters
sample_shape (
Size) – sample shape- Returns
Tensor of shape
sample_shape + batch_shape + event_shape
- property scale#
Scale parameter of this distribution.
- property upper_bound#
Upper bound of this distribution.
- class TruncatedNormal(loc, scale, lower_bound, upper_bound)[source]#
Bases:
alf.utils.distributions.TruncatedDistributionTruncated normal distribution.
The truncated normal distribution \(q(x)\) is defined by 4 parameters: location \(\mu\), scale parameters \(s\), lower bound \(l\) and upper bound \(u\).
\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s})\]where \(p\) and \(P\) are the pdf and cdf of the standard normal distribution respectively.
- Parameters
loc – the location parameter
scale – the scale parameter
lower_bound – the lower bound
upper_bound – the upper bound
its – the standard distribution to be used.
- class TruncatedT2(loc, scale, lower_bound, upper_bound)[source]#
Bases:
alf.utils.distributions.TruncatedDistributionTruncated Student’s t distribution with degree of freedom 2.
The truncated normal distribution \(q(x)\) is defined by 4 parameters: location \(\mu\), scale parameters \(s\), lower bound \(l\) and upper bound \(u\).
\[q(x) = \frac{1}{s (P(u) - P(l))}p(\frac{x-\mu}{s})\]where \(p(x)=1 / (2 * (1 + x^2)^1.5)\) and \(P\) is the cdf of \(p(x)\).
- Parameters
loc – the location parameter
scale – the scale parameter
lower_bound – the lower bound
upper_bound – the upper bound
its – the standard distribution to be used.
- t2cdf()#
alf.utils.external_configurables#
Make various external gin-configurable objects.
alf.utils.gin_utils#
- inoperative_config_str(max_line_length=80, continuation_indent=4)[source]#
Retrieve the “inoperative” configuration as a config string.
- Parameters
max_line_length (int) – A (soft) constraint on the maximum length of a line in the formatted string.
continuation_indent (int) – The indentation for continued lines.
- Returns
- A config string capturing all parameter values configured but not
used by the current program (override by explicit call).
alf.utils.git_utils#
Git utilities.
alf.utils.lean_function#
- lean_function(func)[source]#
Wrap
functo save memory for backward.The returned function performs same computation as
func, but save memory by discarding intermediate results. It calculates the gradient by recomputingfuncusing the same input during backward.Note: There are several requirements for
func:All the Tensor inputs to
funcmust be explicitly listed as arguments
of
func. For example, a tuple of Tensors as argument is not allowed. Using Tensors outside offunc(e.g., tensors from class member variables) is not allowed either unlessfuncis ann.Module. On the other hand, iffuncis a module, its parameters should not be put as arguments as they are automatically taken care of.If
funcis not aNetwork, its return value must be a Tensor
or a tuple of Tensors. If it is a
Network, its return value (output and state) must be a nest of Tensors.func``must be deterministic so that repeated evaluation with the
same input will get same output.
- It is the responsibility of the user of this function to make sure that
funcsatifisies these requirements.lean_functionwill not report error iffuncdoes not satisfies these requirements and error will be silently ignored.
Note: pytorch also has a function with similar functionality. See https://pytorch.org/docs/stable/checkpoint.html for detail.
lean_functionhas several advantage over pytorch’s implementation:Keyword arguments are supported.
Both
torch.autograd.gradandtorch.autograd.backwardare supported.
Examples:
Apply to simple function:
def myfunc(x, w, b, scale=1.0): return torch.sigmoid(scale * (x @ w) + b) lean_myfunc = lean_function(myfunc) y = lean_myfunc(x, w, b)
Apply to nn.Module:
module = alf.layers.FC(3, 5, activation=torch.relu_) lean_func = lean_function(module) y = lean_func(x)
Apply to a network
net = alf.nn.Sequential( alf.layers.FC(3, 5, activation=torch.relu_), alf.layers.FC(5, 1, activation=torch.sigmoid)) lean_func = lean_function(net) y = lean_func(x)
- Parameters
func (
Callable) – function or module to be wrapped.- Return type
Callable- Returns
the wrapped function or module. In the case of
funcbeing ann.Module, all the original attributes and methods can still be accessed in the same way through the wrapped module.
alf.utils.losses#
Various function/classes related to loss computation.
- class AsymmetricSimSiamLoss(proj_net=None, pred_net=None, input_size=None, proj_hidden_size=256, pred_hidden_size=128, output_size=256, proj_last_use_bn=False, eps=1e-05, fixed_weight_norm=False, lr=None, debug_summaries=True, name='SimSiamLoss')[source]#
Bases:
torch.nn.modules.module.ModuleThe siamese loss proposed in:
Chen Xinlei et. al. “Exploring Simple Siamese Representation Learning” CVPR 2021
The loss is
1-cosine(pred(proj(x), detach(proj(y))), where x is the predicted representation, y is the target representation, and pred and proj are computed usingproj_netandpred_net.- Parameters
proj_net (
Optional[Network]) – if not provided, a default MLP with two hidden layers and output size asoutput_sizewill be created.pred_net (
Optional[Network]) – if not provided, a default MLP with one hidden layer will be created.input_size (
Optional[int]) – input size ofproj_netproj_hidden_size (
int) – the size of the hidden layers of proj_net. Only useful ifproj_netis not provided.pred_hidden_size (
int) – the size of the hidden layer of pred_net. Only useful ifpred_netis not provided.proj_last_use_bn (
bool) – whether to use batch norm for the output layer of proj_net. Only useful ifproj_netis not providedeps (
float) – theepsfor callingF.normalize()when calculating the normalized vector in order to calculate cosine.fixed_weight_norm (
bool) – whether to fix the norm of the weight parameter of the FC layers.lr (
Optional[float]) – learning rate. If None, the default learning rate will be used.debug_summaries (
bool) – whether to write debug summariesname (
str) – name of this loss
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- forward(pred, target)[source]#
Calculate the loss.
- Parameters
pred (
Tensor) – predicted representation of shape [B, T, …]target (
Tensor) – target representation of shape [B, T, …]
- Return type
Tensor- Returns
loss of shape [B, T]
- training: bool#
- class BipartiteMatchingLoss(reduction='mean', name='BipartiteMatchingLoss')[source]#
Bases:
objectBipartite matching loss.
This order-invariant loss can be used to evaluate the matching between a predicted set and a target set. The idea is that for every forward, an optimal one-to-one mapping assignment from the predicted set to the target set is first found using some efficient bipartite graph matching algorithm, and the optimal loss is minimized.
Mathematically, suppose there are \(N\) objects in either set, \(L(x,y)\) is the matching loss between any \((x,y)\) object pair, and \(\mathcal{G}_N\) is the permuation space. The forward loss to be minimized is:
\[\min_{g\in\mathcal{G}_N}\sum_n^N L(x_n(\theta),y_{g(n)})\]where \(\theta\) is the model parameters.
In practice, to find the optimal assignment, we simply use
scipy.optimize.linear_sum_assignment.- References::
End-to-End Object Detection with Transformers, Carion et al.
https://github.com/facebookresearch/detr/blob/main/models/matcher.py
- Parameters
reduction (
str) – ‘sum’, ‘mean’ or ‘none’. This is how to reduce the matching loss. For the former two, the loss shape is[B], while for the ‘none’, the loss shape is[B,N].
- forward(matching_cost_mat, cost_mat=None)[source]#
Compute the optimal matching loss.
- Parameters
matching_cost_mat (
Tensor) – the cost matrix used to determine the optimal matching. It shape should be[B,N,N].cost_mat (
Optional[Tensor]) – the cost matrix used to compute the optimal loss once the optimal matching is found. According to the DETR paper, this cost matrix might be different from the one used for matching. If None, then it will be the same matrix for matching.
- Returns
the optimal loss. If reduction is ‘mean’ or ‘sum’, its shape is
[B], otherwise its shape is[B,N].the optimal matching given the cost matrix. Its shape is
[B,N], where the value of n-th entry is its mapped index in the target set.
- Return type
tuple
- class DiscreteRegressionLoss(transform=None, inverse_after_mean=False)[source]#
Bases:
alf.utils.losses._DiscreteRegressionLossBaseA loss for predicting the distribution of a scalar.
The target is assumed to be in the range
[-(n-1)//2, n//2], wheren=logits.shape[-1]. The logits are used to calculate the probabilities of being one of thenvalues. If a target value y is not an integer, it is treated as having prabability mass of \(y- \lfloor y \rfloor\) at \(\lfloor y \rfloor + 1\) and probability mass of \(1 + \lfloor y \rfloor - y\) at \(\lfloor y \rfloor\). Then cross entropy loss is applied.More specifically, the
logitspassed tocalc_lossrepresents the following: P = softmax(logits) and P[i] means the probability that the (transformed)targetis equal toi - (n-1)//2- Note:
DescreteRegressionLoss(SqrtLinearTransform(0.001), inverse_after_mean=True) is the loss used by MuZero paper.
- Parameters
transform (
Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.inverse_after_mean – when calculating the expected prediction, whether to do the inverse transformation after calculating the the expectation in the transformed space. Note that using
inverse_after_mean=Truewill make the expectation biased in general. This is because \(f^{-1}(E(x)) \le E(f^{-1}(x))\) (Jensen inequality) if \(f^{-1}\) is convex. In our case, \(f^{-1}\) is convex for \(x \ge 0\).
- calc_expectation(logits)[source]#
Calculate the expected predition in the untransfomred domain from
pred.- Parameters
pred – raw model prediction
- initialize_bias(bias)[source]#
Initialize the bias of the last FC layer for the prediction properly.
This function set the bias so that the initial distribution of the prediction in the original domain of target is approximatedly Cauchy: \(p(x) \propto \frac{1}{1+x^2}\)
- Parameters
bias (
Tensor) – the bias parameter to be initialized.
- Note:
- class MeanSquaredLoss(batch_dims=1, debug_summaries=True, name='MSELoss')[source]#
Bases:
objectMean squared loss.
For a prediction and target pair (x,y), the loss is
((x - y) ** 2).mean().- Parameters
batch_dims (
int) – the first so many dims of prediction and target are treated as batch dimension. The mean is performed on the rest of the dimensions.
- class OrderedDiscreteRegressionLoss(transform=None, inverse_after_mean=False)[source]#
Bases:
alf.utils.losses._DiscreteRegressionLossBaseA loss for predicting the distribution of a scalar.
The target is assumed to be in the range
[-(n-1)//2, n//2], wheren=logits.shape[-1]. The logits are used to calculate the probabilities of being greater than or equal to each of thesenvalues. If a target value y is not an integer, it is treated as having prabability mass of \(y- \lfloor y \rfloor\) at \(\lfloor y \rfloor + 1\) and probability mass of \(1 + \lfloor y \rfloor - y\) at \(\lfloor y \rfloor\). Then binary cross entropy loss is applied.More specifically, the
logitspassed tocalc_lossrepresents the following: P = sigmoid(logits) and P[i] means the probability that the (transformed)targetis greater than or equal toi - (n-1)//2- Parameters
transform (
Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.inverse_after_mean – when calculating the expected prediction, whether to do the inverse transformation after calculating the the expectation in the transformed space. Note that using
inverse_after_mean=Truewill make the expectation biased in general. This is because \(f^{-1}(E(x)) \le E(f^{-1}(x))\) (Jensen inequality) if \(f^{-1}\) is convex. In our case, \(f^{-1}\) is convex for \(x \ge 0\).
- calc_expectation(logits)[source]#
Calculate the expected predition in the untransfomred domain from
pred.- Parameters
pred – raw model prediction
- initialize_bias(bias)[source]#
Initialize the bias of the last FC layer for the prediction properly.
This function set the bias so that the initial distribution of the prediction in the original domain of target is approximatedly Cauchy: \(p(x) \propto \frac{1}{1+x^2}\)
- Parameters
bias (
Tensor) – the bias parameter to be initialized.
- class QuantileRegressionLoss(transform=None, inverse_after_mean=False, delta=0.0)[source]#
Bases:
alf.utils.losses.ScalarPredictionLossMulti-quantile Huber loss
The loss for simultaneous multiple quantile regression. The number of quantiles n is
quantiles.shape[-1].quantiles[..., k]is the quantile value estimation for quantile \((k + 0.5) / n\). For each prediction, there can be one or multiple target values.This loss is described in the following paper:
Dabney et. al. Distributional Reinforcement Learning with Quantile Regression
- Parameters
transform (
Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.inverse_after_mean (
bool) – when calculating the expected prediction, whether to do the inverse transformation after calculating the the expectation in the transformed space. Note that usinginverse_after_mean=Truewill make the expectation biased in general. This is because \(f^{-1}(E(x)) \le E(f^{-1}(x))\) (Jensen inequality) if \(f^{-1}\) is convex. In our case, \(f^{-1}\) is convex for \(x \ge 0\).delta (
float) – the smoothness parameter for huber loss (larger means smoother). Note that the quantile estimation with delta > 0 is biased. You should use a small value fordeltaif you want the quantile estimation to be less biased (so that the mean of the quantile will be close to mean of the samples).
- class ScalarPredictionLoss[source]#
Bases:
object- calc_expectation(pred)[source]#
Calculate the expected predition in the untransfomred domain from
pred.
- initialize_bias(bias)[source]#
Initialize the bias of the last FC layer for the prediction properly.
This function can be passed to FC as bias_initializer.
For some losses (e.g. OrderedDiscreteRegresion), initializing bias to zero can have very bad initial predictions. So we provide an interface for doing loss specific intializations. Note that the weight of the last FC should be initialized to zero in general.
- Parameters
bias (
Tensor) – the bias parameter to be initialized.
- class SquareLoss(transform=None)[source]#
Bases:
alf.utils.losses.ScalarPredictionLossSquare loss for predicting scalar target.
- Parameters
transform (
Optional[InvertibleTransform]) – the transformation applied to target. If it is provided, the the regression target will be transformed.
- element_wise_huber_loss(x, y)[source]#
Elementwise Huber loss.
- Parameters
x (Tensor) – label
y (Tensor) – prediction
- Returns
loss (Tensor)
- element_wise_squared_loss(x, y)[source]#
Elementwise squared loss.
- Parameters
x (Tensor) – label
y (Tensor) – prediction
- Returns
loss (Tensor)
- huber_function(x, delta=1.0)[source]#
Huber function.
- Parameters
x (
Tensor) – difference between the observed and predicted valuesdelta (
float) – the threshold at which to change between delta-scaled L1 and L2 loss, must be positive. Default value is 1.0
- Returns
Huber function (Tensor)
- multi_quantile_huber_loss(quantiles, target, delta=0.1)[source]#
Multi-quantile Huber loss
The loss for simultaneous multiple quantile regression. The number of quantiles n is
quantiles.shape[-1].quantiles[..., k]is the quantile value estimation for quantile \((k + 0.5) / n\). For each prediction, there can be one or multiple target values.This loss is described in the following paper:
Dabney et. al. Distributional Reinforcement Learning with Quantile Regression
- Parameters
quantiles (
Tensor) – batch_shape + [num_quantiles,]target (
Tensor) – batch_shape or batch_shape + [num_targets, ]delta (
float) – the smoothness parameter for huber loss (larger means smoother). Note that the quantile estimation with delta > 0 is biased. You should use a small value fordeltaif you want the quantile estimation to be less biased (so that the mean of the quantile will be close to mean of the samples).
- Return type
Tensor- Returns
loss of batch_shape
alf.utils.math_ops#
Various math ops.
- class Log1pTransform(alpha=20)[source]#
Bases:
alf.utils.math_ops.InvertibleTransformImplementing the following transformation:
\[y=\alpha sign(x)\log(1+|x|)\]- Parameters
alpha (
float) – \(\alpha\) in the above formula
- class Softsign[source]#
Bases:
torch.autograd.function.FunctionSoftsign function.
Applies element-wise, the function \(\text{SoftSign}(x) = \frac{x}{1 + |x|}\)
Compared to
Softsign_, this uses more memory but is faster and has higher precision for backward.- static backward(ctx, grad_output)[source]#
Defines a formula for differentiating the operation.
This function is to be overridden by all subclasses.
It must accept a context
ctxas the first argument, followed by as many outputs didforward()return, and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,backward()will havectx.needs_input_grad[0] = Trueif the first input toforward()needs gradient computated w.r.t. the output.
- static forward(ctx, input)[source]#
Performs the operation.
This function is to be overridden by all subclasses.
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
The context can be used to store tensors that can be then retrieved during the backward pass.
- class Softsign_[source]#
Bases:
torch.autograd.function.FunctionInplace version of softsign function.
Applies element-wise inplace, the function \(\text{SoftSign}(x) = \frac{x}{1 + |x|}\)
The current pytorch implementation of softsign is inefficient for backward because it relies on automatic differentiation and does not have an inplace version. Hence we provide a more efficient implementation.
Reference: PyTorch: Defining New Autograd Functions
- static backward(ctx, grad_output)[source]#
Defines a formula for differentiating the operation.
This function is to be overridden by all subclasses.
It must accept a context
ctxas the first argument, followed by as many outputs didforward()return, and it should return as many tensors, as there were inputs toforward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,backward()will havectx.needs_input_grad[0] = Trueif the first input toforward()needs gradient computated w.r.t. the output.
- static forward(ctx, input)[source]#
Performs the operation.
This function is to be overridden by all subclasses.
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
The context can be used to store tensors that can be then retrieved during the backward pass.
- class Sqrt1pTransform(*args, **kwargs)[source]#
Bases:
alf.utils.math_ops.InvertibleTransformThe transformation used by MuZero with epsilon = 0.
\[y=sign(x) (\sqrt{|x| +1} - 1) = x / (\sqrt{|x|+1} + 1)\]The second form has better numerical precision for small x.
- class SqrtLinearTransform(eps=0.001)[source]#
Bases:
alf.utils.math_ops.InvertibleTransformThe transformation used by MuZero.
\[y=sign(x) (\sqrt{|x| +1} - 1) + \epsilon x\]- Parameters
eps (
float) – \(\epsilon\) in the above formula
- add_ignore_empty(x, y)[source]#
Add two Tensors which may be None or ().
If x or y is None, they are assumed to be zero and the other tensor is returned.
- Parameters
x (Tensor|None|()) –
y (Tensor(|None|())) –
- Returns
x + y
- add_n(inputs)[source]#
Calculate the sum of n tensors.
- Parameters
inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
- Returns
the element-wise sum of all the tensors in
inputs.- Return type
Tensor
- argmin(x)[source]#
Deterministic argmin.
Different from torch.argmin, which may have undetermined result if the are multiple elements equal to the min, this argmin is guaranteed to return the index of the first element equal to the min in each row.
- Parameters
x (Tensor) – only support rank-2 tensor
- Returns
rank-1 int64 Tensor represeting the column of the first element in each row equal to the minimum of the row.
- binary_neg_entropy(p)[source]#
Negative entropy for binary outcome.
- Parameters
p (
Tensor) – the probability of one outcome and hence 1-p are the probabilites for the other outcome- Returns
Tensor with the same shape as p
- clipped_exp(value, clip_value_min=- 20, clip_value_max=2)[source]#
Clip value to the range [clip_value_min, clip_value_max] then compute exponential
- Parameters
value (Tensor) – input tensor.
clip_value_min (float) – The minimum value to clip by.
clip_value_max (float) – The maximum value to clip by.
- identity(x)[source]#
PyTorch doesn’t have an identity activation. This can be used as a placeholder.
- max_n(inputs)[source]#
Calculate the maximum of n tensors.
- Parameters
inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
- Returns
the element-wise maximum of all the tensors in
inputs.- Return type
Tensor
- min_n(inputs)[source]#
Calculate the minimum of n tensors.
- Parameters
inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
- Returns
the element-wise minimum of all the tensors in
inputs.- Return type
Tensor
- mul_n(inputs)[source]#
Calculate the product of n tensors.
- Parameters
inputs (iterable[Tensor]) – an iterable of tensors. It requires that all tensor shapes can be broadcast to the same shape.
- Returns
the element-wise multiplication of all the tensors in
inputs.- Return type
Tensor
- normalize_min_max(x)[source]#
Normalize the min and max of each sample x[i] to 0 and 1.
normalize x to [0, 1] as suggested in Appendix G. of MuZero paper.
- Parameters
x (
Tensor) – a batch of samples- Returns
same shape as x
- Return type
Tensor
- shuffle(values)[source]#
Shuffle a nest.
Shuffle all the tensors in
valuesby a same random order.- Parameters
values (nested Tensor) – nested Tensor to be shuffled. All the tensor need to have the same batch size (i.e. shape[0]).
- Returns
shuffled value along dimension 0.
- softclip(x, low, high, hinge_softness=1.0)[source]#
Softly bound
xin between[low, high]. Unlikesoftclip_tf, this transform is symmetric regarding the lower and upper bound when squashing. The softclip function can be defined in several forms:\[\begin{split}\begin{array}{lll} &\ln(\frac{e^{l-x}+1}{e^{x-h}+1}) + x & (1)\\ =&\ln(\frac{e^{x-l}+1}{e^{x-h}+1}) + l & (2)\\ =&\ln(\frac{e^{l-x}+1}{e^{h-x}+1}) + h & (3)\\ \end{array}\end{split}\]- Parameters
x (Tensor) – input
low (float|Tensor) – the lower bound
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from
lowtohigh. Default to 1.
- softclip_tf(x, low, high, hinge_softness=1.0)[source]#
Softly bound
xin between[low, high], namely,clipped = softupper(softlower(x, low), high) softclip(x) = (clipped - high) / (high - softupper(low, high)) * (high - low) + high
The second scaling step is because we will have
softupper(low, high) < lowdue to distortion of softplus, so we need to shrink the interval slightly by(high - low) / (high - softupper(low, high))to preserve the lower bound. Due to this rescaling, the bijector can be mildly asymmetric.- Parameters
x (Tensor) – input
low (float|Tensor) – the lower bound
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from
lowtohigh. Default to 1.
- softlower(x, low, hinge_softness=1.0)[source]#
Softly lower bound
xbylow, namely,softlower(x, low) = softplus(x - low) + low- Parameters
x (Tensor) – input
low (float|Tensor) – the lower bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from
lowto identity. Default to 1.
- Returns
Tensor
- softsign()#
- softsign_()#
- softupper(x, high, hinge_softness=1.0)[source]#
Softly upper bound
xbyhigh, namely,softupper(x, high) = -softplus(high - x) + high.- Parameters
x (Tensor) – input
high (float|Tensor) – the upper bound
hinge_softness (float) – this positive parameter changes the transition slope. A higher softness results in a smoother transition from identity to
high. Default to 1.
- Returns
Tensor
- sum_to_leftmost(value, dim)[source]#
Sum out value.ndim-dim many rightmost dimensions of a given tensor.
- Parameters
value (Tensor) – A tensor of .ndim at least dim.
dim (int) – The number of leftmost dims to remain.
- Returns
The result tensor whose ndim is min(dim, value.dim).
- swish(x)[source]#
Swish activation.
This is suggested in arXiv:1710.05941
- Parameters
x (Tensor) – input
- Returns
Tensor
- weighted_reduce_mean(x, weight, dim=())[source]#
Weighted mean.
- Parameters
x (Tensor) – values for calculating the mean
weight (Tensor) – weight for x. should have same shape as x
dim (int | tuple[int]) – The dimensions to reduce. If None (the default), reduces all dimensions. Must be in the range [-rank(x), rank(x)). Empty tuple means to sum all elements.
- Returns
the weighted mean across axis
alf.utils.normalizers#
- class AdaptiveNormalizer(tensor_spec, speed=8.0, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='AdaptiveNormalizer')[source]#
Bases:
alf.utils.normalizers.NormalizerThis normalizer gives higher weight to more recent samples for calculating mean and variance. Roughly speaking, the weight for each sample at time t is proportional to (t/T)^(speed-1), where T is the current time step. See docs/streaming_averaging_amd_sampling.py for detail.
- Parameters
tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
speed (float) – speed of updating mean and variance.
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –
- training: bool#
- class EMNormalizer(tensor_spec, update_rate=0.001, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='EMNormalizer')[source]#
Bases:
alf.utils.normalizers.NormalizerExponential moving normalizer: the normalization assigns exponentially decayed weights to history samples.
- Parameters
tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
update_rate (float) – the update rate
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –
- training: bool#
- class Normalizer(tensor_spec, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, max_dims_to_summarize=10, name='Normalizer')[source]#
Bases:
torch.nn.modules.module.ModuleCreate a base normalizer using a first-moment and a second-moment averagers.
Given weights \(w_i\) and samples \(x_i, i = 1 \cdots n\), let
\[\begin{split}\begin{array}{lll} m & = \sum_i w_i * x_i \; & \mbox{(first moment)} \\ m2 & = \sum_i w_i * x_i^2 \; & \mbox{(second moment)} \end{array}\end{split}\]then
\[\begin{split}\begin{array}{ll} var & = \sum_i w_i * (x_i - m)^2 \\ & = \sum_i w_i * (x_i^2 + m^2 - 2*x_i*m) \\ & = m2 + m^2 - 2m^2 \\ & = m2 - m^2 \end{array}\end{split}\]which is the same result with the case when \(w_1=w_2=...=w_n=(1/n)\)
NOTE: tf_agents’ normalizer maintains a running average of variance which is not correct mathematically, because the estimated variance contains early components that don’t measure all the current samples.
- Parameters
tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
auto_update (bool) – If True, automatically update mean and variance for each call to
normalize(). Otherwise, the user needs to callupdate()zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epsilon (float) – a small value added to std for normalizing
debug_summaries (bool) – True if debug summaries should be created.
max_dims_to_summarize (int) – when
debug_summaries=True, the max number of dims of the normalizer’s statistics will be summarized. Note that a large number could potentially dump a lot of TB plots, consume much disk space, and slow down training. Default: 10.name (str) –
- forward(input)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- normalize(tensor, clip_value=- 1.0)[source]#
Normalize a tensor with mean and variance
- Parameters
tensor (nested Tensor) – each leaf can have arbitrary outer dims with shape [B1, B2,…] + tensor_spec.shape.
clip_value (float) – if positive, normalized values will be clipped to [-clip_value, clip_value].
- Returns
normalized tensor
- training: bool#
- class ScalarAdaptiveNormalizer(speed=8.0, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='ScalarAdaptiveNormalizer')[source]#
Bases:
alf.utils.normalizers.AdaptiveNormalizerThis normalizer gives higher weight to more recent samples for calculating mean and variance. Roughly speaking, the weight for each sample at time t is proportional to (t/T)^(speed-1), where T is the current time step. See docs/streaming_averaging_amd_sampling.py for detail.
- Parameters
tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
speed (float) – speed of updating mean and variance.
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –
- training: bool#
- class ScalarEMNormalizer(update_rate=0.001, auto_update=True, variance_epsilon=1e-10, zero_mean=True, unit_std=False, debug_summaries=False, name='ScalarEMNormalizer')[source]#
Bases:
alf.utils.normalizers.EMNormalizerArgs: tensor_spec (nested TensorSpec): specs of the mean of tensors to be
normalized.
update_rate (float): the update rate auto_update (bool): If True, automatically update mean and variance
for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool): whether to make the normalized value be zero-mean unit_std (bool): whether assume a unit std or not when normalizing.
If True, then the rewards are just subtracted by the mean.
variance_epislon (float): a small value added to std for normalizing debug_summaries (bool): whether to generate debug summaries name (str):
- training: bool#
- class ScalarWindowNormalizer(window_size=1000, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='ScalarWindowNormalizer')[source]#
Bases:
alf.utils.normalizers.WindowNormalizerArgs: tensor_spec (nested TensorSpec): specs of the mean of tensors to be
normalized.
window_size (int): the size of the recent window auto_update (bool): If True, automatically update mean and variance
for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool): whether to make the normalized value be zero-mean unit_std (bool): whether assume a unit std or not when normalizing.
If True, then the rewards are just subtracted by the mean.
variance_epislon (float): a small value added to std for normalizing debug_summaries (bool): whether to generate debug summaries name (str):
- training: bool#
- class WindowNormalizer(tensor_spec, window_size=1000, auto_update=True, zero_mean=True, unit_std=False, variance_epsilon=1e-10, debug_summaries=False, name='WindowNormalizer')[source]#
Bases:
alf.utils.normalizers.NormalizerNormalization according to a recent window of samples.
- Parameters
tensor_spec (nested TensorSpec) – specs of the mean of tensors to be normalized.
window_size (int) – the size of the recent window
auto_update (bool) – If True, automatically update mean and variance for each call to normalize(). Otherwise, the user needs to call update()
zero_mean (bool) – whether to make the normalized value be zero-mean
unit_std (bool) – whether assume a unit std or not when normalizing. If True, then the rewards are just subtracted by the mean.
variance_epislon (float) – a small value added to std for normalizing
debug_summaries (bool) – whether to generate debug summaries
name (str) –
- training: bool#
alf.utils.per_process_context#
- class PerProcessContext[source]#
Bases:
objectA singletone that maintains the per process runtime properties.
It is used mainly in multi-process distributed training mode, where properties such as the rank of the process and the total number of processes can be accessed via this interface.
Construct the singleton instance.
This initializes the singleton and default values are assigned to the properties.
- property ddp_rank#
- property is_distributed#
- property num_processes#
- property paras_queue: multiprocessing.context.BaseContext.Queue#
- Return type
Queue
alf.utils.plot_tb_curves#
- class CurvesPlotter(mean_curves, y_clipping=None, x_range=None, y_range=None, x_ticks=None, x_label=None, y_label=None, x_scaled_and_aligned=False, figsize=(4, 4), dpi=100, linestyle='-', linewidth=2, std_alpha=0.2, colors=None, markers=None, bg_color='white', grid_color='#e6e5e3', plot_mean_only=False, legend_kwargs={'loc': 'best'}, title=None)[source]#
Bases:
objectPlot several ``MeanCurve``s in a figure. The curve colors will form a cycle over 10 default colors. The user should make sure that the ``MeanCurve``s to plot are meaningful to be compared in one figure.
For each
MeanCurve, itsyfield will be plotted as the mean, itsmin_yandmax_ywill be plotted by a shaded area aroundy, and itsxdetermines the x-axis range.- Parameters
mean_curves (MeanCurve|list[MeanCurve]) – each
MeanCurveshould correspond to a different method.x_range (tuple[float]) – a tuple of
(min_x, max_x)for showing on the figure. If None, then(0, 1)will be used. This argument is only used whenx_scaled_and_aligned==True.y_range (tuple[float]) – a tuple of
(min_y, max_y)for showing on the figure. If None, then it will be decided according to theyvalues. Note that this range won’t changeydata; it’s only used by matplotlib for drawingylimits.x_ticks (list[float]) – x ticks shown along x axis
y_clipping (tuple[float]) – the y values will be clipped to this range if not None. Because of smoothing in
MeanCurveReaderand/or std region, the input y values might be out of this range.x_label (str) – shown besides x-axis
y_label (str) – shown besides y-axis
x_scaled_and_aligned (bool) –
If True, the x axes of all
MeanCurvewill be scaled and aligned so that the lower and upper \(x\) bounds of all curves will bex_range, and each curve’s \(x\) axix will be proportionally scaled. If False, the \(x\) axis will be plotted according to \(x\) of eachMeanCurveas it is. Note that this process only involves \(x\) scaling and no interpolation of \(y\) values will ever be performed. For example, we have threeMeanCurvesto be plotted in a figure:mean_curve1 x: (0, 100) mean_curve2 x: (20, 80) mean_curve3 x: (100, 200)
with
x_range==(0,1). Then in the plotted figure, the \(x\) range (not x-ticks which can be specified differently!) will bemean_curve1 x: (0, 0.5) mean_curve2 x: (0.1, 0.4) mean_curve3 x: (0.5, 1)
figsize (tuple[int]) – a tuple of ints determining the size of the figure in inches. A larger figure size will allow for longer texts, more axes or more ticklabels to be shown.
dpi (int) – Dots per inches. How many pixels each inch contains. A
figsizeof(w,h)consists ofw*h*dpi**2pixels.linestyle (str|list[str]) – the line style to plot. Possible values: ‘-‘ (‘solid’), ‘–’ (‘dashed’), ‘-.’ (dashdot), and ‘:’ (‘dotted’). If a string, then all curves will have the same style; otherwise each option will apply to the corresponding curve.
linewidth (int) – the thickness of lines to plot. Default: 2.
std_alpha (float) – the transparency value for plotting shaded area around a curve.
bg_color (str) – the background color of the figure
grid_color (str) – color of the dashed grid lines
plot_mean_only (bool) – Whether only plot the mean curve without shaded regions.
legend_kwargs (dict) – kwargs for plotting the legend. If None, then no legend will be plotted.
title (str) – title of the figure
- plot(output_path, dpi=200, transparent=False, close_fig=True)[source]#
Plot curves and save the figure to disk.
- Parameters
output_path (str) – the output file path
dpi (int) – dpi for the figure. A higher value results in higher resolution.
transparent (bool) – If True, then the figure has a transparent background.
close_fig (bool) – whether to close/release this figure after plotting. If
False, the user has to close it manually.
- class EnvironmentStepsReturnReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#
Bases:
alf.utils.plot_tb_curves.MeanCurveReaderCreate a mean curve reader that reads AverageReturn values.
- Parameters
event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different \(x\) into a
MeanCurve. For example, if there are three curves:curve1 x: (1, 9), curve2 x: (0, 10), curve3 x: (0, 8),
then the merged
MeanCurvewill have \((1, 8)\) as the final \(x\) range. Each curve’s new \(y\) values will be interpolated w.r.t. this common \(x\) range approperiately given their original \(y=f(x)\) curve. The common \(x\) range will be automatically determined as in the example if this argumentx_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of
(80,85,90,95,99)indicating the confidence level (percentage).
- Returns
a mean curve structure.
- Return type
- property x_label#
- property y_label#
- class EnvironmentStepsSuccessReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#
Bases:
alf.utils.plot_tb_curves.MeanCurveReaderCreate a mean curve reader that reads Success rates.
- Parameters
event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different \(x\) into a
MeanCurve. For example, if there are three curves:curve1 x: (1, 9), curve2 x: (0, 10), curve3 x: (0, 8),
then the merged
MeanCurvewill have \((1, 8)\) as the final \(x\) range. Each curve’s new \(y\) values will be interpolated w.r.t. this common \(x\) range approperiately given their original \(y=f(x)\) curve. The common \(x\) range will be automatically determined as in the example if this argumentx_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of
(80,85,90,95,99)indicating the confidence level (percentage).
- Returns
a mean curve structure.
- Return type
- property x_label#
- property y_label#
- class IterationsReturnReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#
Bases:
alf.utils.plot_tb_curves.MeanCurveReaderCreate a mean curve reader that reads AverageReturn values.
- Parameters
event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different \(x\) into a
MeanCurve. For example, if there are three curves:curve1 x: (1, 9), curve2 x: (0, 10), curve3 x: (0, 8),
then the merged
MeanCurvewill have \((1, 8)\) as the final \(x\) range. Each curve’s new \(y\) values will be interpolated w.r.t. this common \(x\) range approperiately given their original \(y=f(x)\) curve. The common \(x\) range will be automatically determined as in the example if this argumentx_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of
(80,85,90,95,99)indicating the confidence level (percentage).
- Returns
a mean curve structure.
- Return type
- property x_label#
- property y_label#
- class IterationsSuccessReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#
Bases:
alf.utils.plot_tb_curves.MeanCurveReaderCreate a mean curve reader that reads Success rates.
- Parameters
event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different \(x\) into a
MeanCurve. For example, if there are three curves:curve1 x: (1, 9), curve2 x: (0, 10), curve3 x: (0, 8),
then the merged
MeanCurvewill have \((1, 8)\) as the final \(x\) range. Each curve’s new \(y\) values will be interpolated w.r.t. this common \(x\) range approperiately given their original \(y=f(x)\) curve. The common \(x\) range will be automatically determined as in the example if this argumentx_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of
(80,85,90,95,99)indicating the confidence level (percentage).
- Returns
a mean curve structure.
- Return type
- property x_label#
- property y_label#
- class MeanCurve(x=None, y=None, min_y=None, max_y=None, ay=None, min_ay=None, max_ay=None, name=None)[source]#
Bases:
alf.utils.plot_tb_curves.MeanCurveCreate new instance of MeanCurve(x, y, min_y, max_y, ay, min_ay, max_ay, name)
- classmethod from_curves(x, ys, interval_mode='std', name='MeanCurve')[source]#
Compute various curve statistics from a set of individual curves
ysand a commonx, and create a class instance.- Parameters
x (np.array) – x steps
ys (list[np.array]) – a list of curves
interval_mode (str) – mode for computing error margin around the mean y curve. Should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of
(80,85,90,95,99)indicating the confidence level (percentage).name (str) –
- class MeanCurveGroupReader(mean_curve_readers, task_performance_ranges=None, name='MeanCurveGroupReader')[source]#
Bases:
objectGroup several
MeanCurveReaderresults. AMeanCurveGroupReaderis suitable for one method on multiple tasks, each task with multiple runs. To aggregate across tasks, each task must be provided with a performance range \((y_0, y_1)\) that will be used to normalize performance for that task as \(\frac{y - y_0}{y_1 - y_0}\). If the ranges are not provided, no normalization will be done.The aggregation is simply averaging the statistics of individual
MeanCurve.- Parameters
mean_curve_readers (list[MeanCurveReader]) – a list of
MeanCurveReaderof multiple tasks for one method. It’s the user’s responsibility to ensure that it’s meaningful to group these task event files and show their mean and variance.task_performance_ranges (list[tuple(float)]) – a list of tuples, where each tuple is a pair of floats used for normalizing the corresponding task. If None, no normalization will be performed.
name (str) – name of the method
- property name#
- property x_label#
- property y_label#
- class MeanCurveReader(event_file, x_steps=None, name='MeanCurveReader', smoothing=None, interval_mode='std')[source]#
Bases:
objectRead and compute a
MeanCurvefrom one or multiple TB event files. AMeanCurveReaderis suitable for one method on one task with multiple runs.- Parameters
event_file (str|list[str]) – a string or a list of strings where each should point to a valid TB dir, e.g., ending with “eval/” or “train/”. The curves of these files will be averaged. It’s the user’s responsibility to ensure that it’s meaningful to group these event files and show their mean and variance.
x_steps (list[int]) –
we support merging curves that have different \(x\) into a
MeanCurve. For example, if there are three curves:curve1 x: (1, 9), curve2 x: (0, 10), curve3 x: (0, 8),
then the merged
MeanCurvewill have \((1, 8)\) as the final \(x\) range. Each curve’s new \(y\) values will be interpolated w.r.t. this common \(x\) range approperiately given their original \(y=f(x)\) curve. The common \(x\) range will be automatically determined as in the example if this argumentx_steps==None. Alternatively, the user can specify a pre-defined list of integers for interpolation.name (str) – name of the mean curve.
smoothing (int | float) – if None, no smoothing is applied; if int, it’s the window width of a Savitzky-Golay filter; if float, it’s the smoothing weight of a running average (higher -> smoother).
interval_mode (str) – should be one of the four options: 1) “std”, 2) “minmax”, and 3) “CI_X”. The last one means confidence interval, where ‘X’ should be one of
(80,85,90,95,99)indicating the confidence level (percentage).
- Returns
a mean curve structure.
- Return type
- property name#
- property x_label#
- property y_label#
- ema_smooth(scalars, weight=0.6, speed=64.0, adaptive=False, mode='forward')[source]#
EMA smoothing, following TB’s official implementation: https://github.com/tensorflow/tensorboard/blob/master/tensorboard/components/vz_line_chart2/line-chart.ts#L695
For adaptive EMA, the incoming weight decreases as the time increases.
- Parameters
scalars (list[float]) – an array of floats to be smoothed, where the array index represents incoming time steps.
weight (float) – the weight of history. The history is updated as
history * weight + scalar * (1 - weight). Only useful whenadaptive=False.speed (int) – an integer number specifying the adpative weight. Only useful when
adaptive=True. A higher speed means a smaller average window.adaptive (bool) – whether use adaptive weighting or not. If True, then later scalars will have smaller incoming weights (proportional to the inverse of array index).
mode (str) – “forward” | “both”. For “forward” mode, the moving average goes from the array beginning to end. For “both” mode, the moving average has an additional backward pass, and the final smoothed value is an average of forward and backward passes.
alf.utils.pretty_print#
- class PrettyPrinter(indent=1, width=80, depth=None, stream=None, *, compact=False, sort_dicts=True)[source]#
Bases:
pprint.PrettyPrinterCopied from https://stackoverflow.com/questions/30062384/pretty-print-namedtuple
Handle pretty printing operations onto a stream using a set of configured parameters.
- indent
Number of spaces to indent for each level of nesting.
- width
Attempted maximum number of columns in the output.
- depth
The maximum depth to print out nested structures.
- stream
The desired output stream. If omitted (or false), the standard output stream available at construction will be used.
- compact
If true, several items will be combined in one line.
- sort_dicts
If true, dict keys are sorted.
alf.utils.process_coordinator#
Coordinate asynchronous training process termination on request.
- class Coordinator[source]#
Bases:
objectA coordinator for processes.
This class implements a simple mechanism to coordinate the termination of a set of processes.
with coord.stop_on_exception(): while not coord.should_stop(): ...do some work...
Create a new Coordinator.
- clear_stop()[source]#
Clears the stop flag. After this is called, calls to
should_stop()will returnFalse.
- join(processes=None, stop_grace_period_secs=120, ignore_live_processes=False)[source]#
Wait for processes to terminate. This call blocks until a set of processes have terminated. The set of process is the union of the processes passed in the processes argument and the list of processes that registered with the coordinator by calling Coordinator.register_process(). After the processes stop, if an exc_info was passed to request_stop, that exception is re-raised. Grace period handling: When request_stop() is called, processes are given ‘stop_grace_period_secs’ seconds to terminate. If any of them is still alive after that period expires, a RuntimeError is raised. Note that if an exc_info was passed to request_stop() then it is raised instead of that RuntimeError. :param processes: The started processes to join in
addition to the registered processes.
- Parameters
stop_grace_period_secs – Number of seconds given to processes to stop after request_stop() has been called.
ignore_live_processes – If False, raises an error if any of the processes are still alive after stop_grace_period_secs.
- Raises
RuntimeError – If any process is still alive after request_stop() is called and the grace period expires.
- property joined#
- raise_requested_exception()[source]#
If an exception has been passed to request_stop, this raises it.
- register_process(process)[source]#
Register a process to join. :param process: A python.multiprocessing.Process to join.
- request_stop(ex=None)[source]#
Request that the processes stop.
After this is called, calls to
should_stop()will returnTrue. Note: If an exception is being passed in, in must be in the context of handling the exception (i.e.try: ... except Exception as ex: ...) and not a newly created one.- Parameters
ex (Exception or exc_info tuple) – Optional Exception, or
exc_info tuple as returned by sys.exc_info() (Python) –
this is the first call to request_stop() the (If) –
exception is recorded and re-raised from join() (corresponding) –
- stop_on_exception()[source]#
Context manager to request stop when an Exception is raised. Code that uses a coordinator must catch exceptions and pass them to the
request_stop()method to stop the other processes managed by the coordinator. This context handler simplifies the exception handling. Use it as follows:with coord.stop_on_exception(): # Any exception raised in the body of the with # clause is reported to the coordinator before terminating # the execution of the body. ...body...
This is completely equivalent to the slightly longer code:
try: ...body... except: coord.request_stop(sys.exc_info())
- Yields
nothing.
- class Process(coord, target=None, args=(), kwargs={})[source]#
Bases:
multiprocessing.context.ProcessA coordinated process class to execute acting loops.
Creates a process, running target in a loop, managed by coordinator.
- Parameters
coord (Coordinator) – coordinator used to manage this new process.
target (callable) – to be invoked by run() in a loop, until coordinator tells the process to stop.
args (list) – optional arguments for target callable.
kwargs (dict) – optional keyword arguments for target callable.
alf.utils.schedulers#
Schedulers.
- class CyclicalScheduler(progress_type, base_lr, bound_lr, half_cycle_size, switch_mode='step')[source]#
Bases:
alf.utils.schedulers.SchedulerThe cyclical scheduler where the value changes cyclically between two bounds. Reference:
Leslie N. Smith Cyclical Learning Rates for Training Neural Networks, 2017 (https://arxiv.org/pdf/1506.01186.pdf)
This implementation generalizes the original methods in two ways: 1) the initial value can start from either the lower-bound (as in the original method), or upper bound; 2) apart from the linear switching between the bounds, we also support step mode of switching.
In terms of applications, beyond the standard case of using a cyclical learning rate to improve the learning behavior during NN training, this scheduler is also useful in other cases. One example is in reinforcement learning, sometimes we want to update the parameters of different modules at difference paces. For example, in TD3, we want to update the policy every other updates. In this case, we can use a
CyclicalSchedulerwithstepswitching mode to achieve this. Similar cases also appears in Dreamer.- Parameters
progress_type (str) – one of “percent”, “iterations”, “env_steps”
base_lr (float) – the base learning rate, representing the starting value.
bound_lr (float) – the value of the learning rate on the other bound. The value of
bound_lrcould be either larger or smaller than the value ofbase_lr.half_cycle_size (int|float) – the length of half a cycle. Its actual length is based on the
progress_type. For example, if in “iterations” mode, it means the lr value will reach the opposite bound everyhalf_cycle_sizeiterations.switch_mode (str) – the way to switch from one bound to the other. Currently support the following modes: - step: directly jump from one mode to the other every half cycle - linear: linearly move from one mode to the other every half cycle
- class ExponentialScheduler(progress_type, initial_value, decay_rate, decay_time)[source]#
Bases:
alf.utils.schedulers.SchedulerThe value is exponentially decayed based on the progress.
The value is calculated as
initial_value * decay_rate**(progress/decay_time):param progress_type: one of “percent”, “iterations”, “env_steps” :type progress_type: str :param initial_value: initial value :type initial_value: float :param decay_rate: :type decay_rate: float :param decay_time: :type decay_time: float
- class LinearScheduler(progress_type, schedule)[source]#
Bases:
alf.utils.schedulers.SchedulerThe value is linearly changed in each defined region of progress.
- Parameters
progress_type (str) – one of “percent”, “iterations”, “env_steps”
schedule (list[tuple]) – each tuple is a pair of (progress, value) which means that if the current progress between progress[i-1] and progress[i], a linear interpolation between value[i-1] and value[i] will be used. progress[0] must be 0. If the current progress is greater than progress[-1], value[-1] will be used.
- class Scheduler(progress_type)[source]#
Bases:
objectBase class of all schedulers.
A scheduler is used to generate manually defined values based on the training progress.
The subclass should call
progress()to get the current training progress and use it to calculate the scheduled value. There are three types of training progresses:“percent”: percent of training completed.
“iterations”: the number training iterations.
“env_steps”: the number of environment steps
“global_counter”: the value from
alf.summary.get_global_counter()
- Parameters
progress_type (str) – one of “percent”, “iterations”, “env_steps”
- class StepScheduler(progress_type, schedule, warm_up_period=0, start=0)[source]#
Bases:
alf.utils.schedulers.SchedulerThere is one value for each defined region of training progress.
- Parameters
progress_type (str) – one of “percent”, “iterations”, “env_steps”
schedule (list[tuple]) – each tuple is a pair of
(progress, value)the scheduled result will be thevalueof the smallestprogresssuch that it is greater than the current training progress.warm_up_period (
Number) – linearly increasing the output value from 0 to the first value (i.e schedule[0][0]) for a duration ofwarm_up_periodstarting fromstart. The value beforestartwill be 0.start (
Number) – seewarm_up_period
alf.utils.sl_utils#
Supervised learning utilities.
- auc_score(inliers, outliers)[source]#
Computes the AUROC score w.r.t network outputs on two distinct datasets. Typically, one dataset is the main training/testing set, while the second dataset represents a set of unseen outliers.
- Parameters
inliers (torch.tensor) – set of predictions on inlier data
outliers (torch.tensor) – set of predictions on outlier data
- Returns
AUROC score (float)
- classification_loss(output, target)[source]#
Computes the cross entropy loss with respect to a batch of predictions and targets.
- Parameters
output (Tensor) – predictions of shape
[B, D]or[B, N, D].target (Tensor) – targets of shape
[B],[B, 1],[B, N], or[B, N, 1].
- Returns
- LossInfo containing the computed cross entropy loss and the average
accuracy.
- predict_dataset(model, testset)[source]#
Computes predictions for an input dataset.
- Parameters
model (Callable) – model with which to compute predictions.
testset (torch.utils.data.DataLoader) – dataset for which to compute predictions.
- Returns
- a tensor of shape [N, S, D] where
N refers to the number of predictors, S is the number of data points, and D is the output dimensionality.
- Return type
model_outputs (torch.tensor)
alf.utils.spec_utils#
Collection of spec utility functions.
- clip_to_spec(value, spec)[source]#
Clips value to a given bounded tensor spec. :param value: (tensor) value to be clipped. :type spec:
BoundedTensorSpec:param spec: (BoundedTensorSpec) spec containing min and max values for clipping.- Returns
(tensor) value clipped to be compatible with spec.
- Return type
clipped_value
- is_same_spec(spec1, spec2)[source]#
Whether two nested specs are same.
- Parameters
spec1 (nested TensorSpec) – the first spec
spec2 (nested TensorSpec) – the second spec
- Returns
bool
- scale_to_spec(tensor, spec)[source]#
Shapes and scales a batch into the given spec bounds.
- Parameters
tensor – A tensor with values in the range of [-1, 1].
spec (
BoundedTensorSpec) – (BoundedTensorSpec) to use for scaling the input tensor.
- Returns
A batch scaled the given spec bounds.
- spec_means_and_magnitudes(spec)[source]#
Get the center and magnitude of the ranges for the input spec.
- Parameters
spec (BoundedTensorSpec) – the spec used to compute mean and magnitudes.
- Returns
the mean value of the spec bound. spec_magnitudes (Tensor): the magnitude of the spec bound.
- Return type
spec_means (Tensor)
- zeros_from_spec(nested_spec, batch_size)[source]#
Create nested zero Tensors or Distributions.
A zero tensor with shape[0]=`batch_size is created for each TensorSpec and A distribution with all the parameters as zero Tensors is created for each DistributionSpec.
- Parameters
nested_spec (nested TensorSpec or DistributionSpec) –
batch_size (int|tuple|list) – batch size/shape added as the first dimension to the shapes in TensorSpec
- Returns
nested Tensor or Distribution
alf.utils.summary_utils#
Utility functions for generate summary.
- add_mean_hist_summary(name, value)[source]#
Generate mean and histogram summary of
value.- Parameters
name (str) – name of the summary
value (Tensor) – tensor to be summarized
- add_mean_summary(name, value)[source]#
Generate mean summary of
value.- Parameters
name (str) – name of the summary
value (Tensor) – tensor to be summarized
- add_nested_summaries(prefix, data)[source]#
Add summary of a nest of data.
- Parameters
prefix (str) – the prefix of the names of the summaries
data (dict or namedtuple) – data to be summarized
- histogram_continuous(name, data, bucket_min=None, bucket_max=None, bucket_count=30, step=None)[source]#
histogram for continuous data.
- Parameters
name (str) – name for this summary
data (Tensor) – A
Tensorof any shape.bucket_min (float|None) – represent bucket min value, if None value,
data.min()will be usedbucket_max (float|None) – represent bucket max value, if None value,
data.max()will be usedbucket_count (int) – positive
int. The output will have this many buckets.step (None|Tensor) – step value for this summary. this defaults to
alf.summary.get_global_counter()
- histogram_discrete(name, data, bucket_min, bucket_max, step=None)[source]#
histogram for discrete data.
- Parameters
name (str) – name for this summary
data (Tensor) – A
Tensorintegers of any shape.bucket_min (int) – represent bucket min value
bucket_max (int) – represent bucket max value bucket count is calculate as
bucket_max - bucket_min + 1and output will have this many buckets.step (None|Tensor) – step value for this summary. this defaults to
alf.summary.get_global_counter()
- class record_time(tag)[source]#
Bases:
objectA context manager for record the time.
It records the average time spent under the context between two summaries.
Example:
with record_time("time/calc"): long_function()
Create a context object for recording time.
By default, record_time will do cuda.synchronize() before entering and after leaving the context to measure the time accurately. This behavior can be disabled by setting environment variable ALF_RECORD_TIME_SYNC to 0 if you suspect synchronization slow down your code. See https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution.
- Parameters
tag (str) – the summary tag for the the time.
- safe_mean_hist_summary(name, value, mask=None)[source]#
Generate mean and histogram summary of
value.It skips the summary if
valueis empty.- Parameters
name (str) – name of the summary
value (Tensor) – tensor to be summarized
mask (bool Tensor) – optional mask to indicate which element of value to use. Its shape needs to be same as that of
value
- safe_mean_summary(name, value, mask=None)[source]#
Generate mean summary of
value.It skips the summary if
valueis empty.- Parameters
name (str) – name of the summary
value (Tensor) – tensor to be summarized
mask (bool Tensor) – optional mask to indicate which element of value to use. Its shape needs to be same as that of
value
- summarize_action(actions, action_specs, name='action')[source]#
Generate histogram summaries for actions.
Actions whose rank is more than 1 will be skipped.
- Parameters
actions (nested Tensor) – actions to be summarized
action_specs (nested TensorSpec) – spec for the actions
name (str) – name of the summary
- summarize_distribution(name, distributions)[source]#
Generate summary for distributions.
Currently the following types of distributions are supported:
Normal, StableCauchy, Beta: mean and std of each dimension will be summarized
Above distribution wrapped by Independent and TransformedDistribution: the base distribution is summarized
Tensor: each dimenstion dist[…, a] will be summarized
Note that unsupported distributions will be ignored (no error reported).
- Parameters
name (str) – name of the summary
distributions (nested td.distribuation.Distribution) – distributions to be summarized.
- summarize_distribution_gradient(name, distribution, batch_dims=1, clone=False)[source]#
Summarize the gradient of the parameters of
distributionduring backward.- Parameters
name (str) – name of the summary
distribution (nested Distribution) – distribution of which the gradient is to be summarized.
batch_dims (int) – first so many dimensions are treated as batch dimensions
clone (bool) – If True,
distributionwill first be cloned. This is useful ifdistributionis used in multiple places and you only want to summarize the gradient from one place. If False, the gradient will be the sum from all gradients backpropped todistribution.
- Returns
- the cloned
distribution should be used for the downstream calculations.
- the cloned
- Return type
distributionor cloneddistribution
- summarize_gradients(name_and_params, with_histogram=True)[source]#
Add summaries for gradients.
- Parameters
name_and_params (list[(str, Parameter)]) – A list of
(name, Parameter)tuples.with_histogram (bool) – If True, generate histogram.
- summarize_loss(loss_info)[source]#
Add summary about
loss_info- Parameters
loss_info (LossInfo) –
loss_info.extramust be a namedtuple
- summarize_per_category_loss(loss_info, summarize_count=False, label_names=None)[source]#
Add summary about each category of the unaggregated
loss_info.lossof the shape (T, B), or (B, ) by partitioning it according toloss_info.batch_label, which has the same shape asloss_info.loss. It also creates summarization of the number of samples encountered for each category.- Parameters
loss_info (
LossInfo) – do per-category summarization ifis present, and skip otherwise (loss_info.batch_label) –
summarize_count (
bool) – whether to summarize the number of samples for each category as welllabel_names (
Optional[List[str]]) – the names of each category to be used in tensorboard summary. The category number will be used iflabel_namesis None.
- summarize_tensor_gradients(name, tensor, batch_dims=1, clone=False)[source]#
Summarize the gradient of
tensorduring backward.- Parameters
name (str) – name of the summary
tensor (nested Tensor) – tensor of which the gradient is to be summarized.
batch_dims (int) – first so many dimensions are treated as batch dimensions
clone (bool) – If True,
tensorwill first be cloned. This is useful iftensoris used in multiple places and you only want to summarize the gradient from one place. If False, the gradient will be the sum from all gradients backpropped totensor.
- Returns
- the cloned
tensorshould be used for the downstream calculations.
- the cloned
- Return type
tensoror clonedtensor
alf.utils.tensor_utils#
Collection of tensor utility functions.
- class BatchSquash(batch_dims)[source]#
Bases:
objectFacilitates flattening and unflattening batch dims of a tensor. Copied from tf_agents.
Exposes a pair of matched flatten and unflatten methods. After flattening only 1 batch dimension will be left. This facilitates evaluating networks that expect inputs to have only 1 batch dimension.
Create two tied ops to flatten and unflatten the front dimensions.
- Parameters
batch_dims (int) – Number of batch dimensions the flatten/unflatten ops should handle.
- Raises
ValueError – if batch dims is negative.
- append_coordinate(im)[source]#
For the image, we append coordinates as two channels. The image is assumed to be channel-first. The coordinates will range from -1 to 1 evenly.
- Parameters
im (
Tensor) – an image of shape[B,C,H,W].- Returns
- an output image of shape
[B,C+2,H,W]where the extra 2 dimensions are xy meshgrid from -1 to 1.
- an output image of shape
- Return type
torch.Tensor
- clip_by_global_norm(tensors, clip_norm, use_norm=None, in_place=False)[source]#
Clips values of multiple tensors by the ratio of
clip_normto the global norm.Adapted from TF’s version.
Given a nest of tensors
tensors, and a clipping norm thresholdclip_norm, this function clips the tensors in place and returns the global norm (global_norm) of all tensors intensors. Optionally, if you’ve already computed the global norm for tensors, you can specify the global norm withuse_norm.To perform the clipping, each tensor are set to:
tensor * clip_norm / max(global_norm, clip_norm)
where:
global_norm = sqrt(sum([l2norm(t)**2 for t in tensors]))
If
clip_norm > global_normthen the entries intensorsremain as they are, otherwise they’re all shrunk by the global ratio.Any of the entries of
tensorsthat are of type None are ignored.- Parameters
tensors (nested Tensor) – a nest of tensors to be clipped
clip_norm (float or Tensor) – a positive floating scalar
use_norm (float or Tensor) – the global norm to use. If None, global_norm() will be used to compute the norm.
in_place (bool) – If True, then the input tensors will be changed. For tensors that require grads, we cannot modify them in place; on the other hand, if you are clipping the gradients hold by an optimizer, then probably doing this in place will be easier.
- Returns
the clipped tensors global_norm (Tensor): a scalar tensor representing the global norm. If
use_norm is provided, it will be returned instead.
- Return type
tensors (nested Tensor)
- clip_by_norms(tensors, clip_norm, in_place=False)[source]#
Clipping a nest of tensors in place to a maximum L2-norm.
Given a tensor, and a maximum clip value clip_norm, this function normalizes the tensor so that its L2-norm is less than or equal to clip_norm.
- To perform the clipping:
tensor * clip_norm / max(l2norm(tensor), clip_norm)
- Parameters
tensors (nested Tensor) – a nest of tensors
clip_norm (float or Tensor) – a positive scalar
in_place (bool) – If True, then the input tensors will be changed. For tensors that require grads, we cannot modify them in place; on the other hand, if you are clipping the gradients hold by an optimizer, then probably doing this in place will be easier.
- Returns
the clipped tensors
- cov(data, rowvar=False)[source]#
Estimate a covariance matrix given data.
- Parameters
data (tensor) – A 1-D or 2-D tensor containing multiple observations of multiple dimensions. Each row of
matrepresents a dimension of the observation, and each column a single observation.rowvar (bool) – If True, then each row represents a dimension, with observations in the columns. Othewise, each column represents a dimension while the rows contains observations.
- Returns
The covariance matrix
- explained_variance(ypred, y, valid_mask=None, dim=None)[source]#
Computes fraction of variance that ypred explains about y.
Adapted from baselines.ppo2 explained_variance()
Interpretation:
ev=0: might as well have predicted zero
ev=1: perfect prediction
ev<0: worse than just predicting zero
- Parameters
ypred (Tensor) – prediction for y
y (Tensor) – target
valid_mask (Tensor) – an optional
dim (None|int) – the dimension to reduce. If not provided, the explained variance is calculated for all dimensions.
- Returns
1 - Var[y-ypred] / Var[y]
- global_norm(tensors)[source]#
Computes the global norm of a nest of tensors.
Adapted from TF’s version.
Given a nest of tensors
tensors, this function returns the global norm of all tensors intensors. The global norm is computed as:global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
Any entries in
tensorsthat are of typeNoneare ignored.- Parameters
tensors (nested Tensor) – a nest of tensors
- Returns
a scalar tensor
- Return type
norm (Tensor)
- reverse_cumprod(x, dim)[source]#
Perform cumprod in a reverse order along the dimension specified by dim.
- Parameters
x (Tensor) – the tensor to compute the reverse cumprod on
dim (int) – the value indicating the dimension along which to calculate the reverse cumprod
- Returns
the reverse cumprod tensor. It has the same shape as x.
- reverse_cumsum(x, dim)[source]#
Perform cumsum in a reverse order along the dimension specified by dim.
- Parameters
x (Tensor) – the tensor to compute the reverse cumsum on
dim (int) – the value indicating the dimension along which to calculate the reverse cumsum
- Returns
the reverse cumsumed tensor. It has the same shape as x.
- scale_gradient(tensor, scale, clone_input=True)[source]#
Scales the gradient of tensor for the backward pass. :param tensor: a tensor which requires gradient. :type tensor: Tensor :param scale: a scalar factor to be multiplied to the gradient
of tensor.
- Parameters
clone_input (bool) – If True, clone the input tensor before applying gradient scaling. This option is useful when there are multiple computational branches originated from tensor and we want to apply gradient scaling to part of them without impacting the rest. If False, apply gradient scaling to the input tensor directly.
- Returns
The (cloned) tensor with gradient scaling hook registered.
- spatial_broadcast(z, im_shape)[source]#
Broadcasting an embedding across the image spatial domain. The image shape is assumed to be channel-first.
- Parameters
z (
Tensor) – embedding of shape[...,D]to be broadcast spatiallyim_shape (
Tuple[int]) – a tuple of ints where the last two are height and width.
- Returns
- a broadcast image of spec
[...,D,H,W]whereDis the input embedding size and
[H,W]are input height and width.
- a broadcast image of spec
- Return type
torch.Tensor
- tensor_extend(x, y)[source]#
Extending tensor
xwith new_slicey.y.shapeshould be same asx.shape[1:]- Parameters
x (Tensor) – tensor to be extended
y (Tensor) – the tensor which will be appended to x
- Returns
the extended tensor. Its shape is
(x.shape[0]+1, x.shape[1:])- Return type
Tensor
- tensor_extend_new_dim(x, dim, n)[source]#
Extending the tensor along a new dimension with a replica of n.
- Parameters
x (Tensor) – tensor to be extended
dim (int) – the value indicating the position of the newly inserted dimension
n (int) – the number of replica along dim
- Returns
the extended tensor. Its shape is
(*x.shape[0:dim], n, *x.shape[dim:])- Return type
Tensor
- tensor_extend_zero(x, dim=0)[source]#
Extending tensor with zeros along an axis.
- Parameters
x (Tensor) – tensor to be extended
dim (int) – the axis to extend zeros
- Returns
- the extended tensor. Its shape is
(*x.shape[:dim], x.shape[dim]+1, *x.shape[dim+1:])
- Return type
Tensor
- tensor_prepend(x, y)[source]#
Prepending tensor with y.
y.shape should be same as tensor.shape[1:] :param x: tensor to be prepended :type x: Tensor :param y: the tensor which will be appended to x :type y: Tensor
- Returns
the prepended tensor. Its shape is
(x.shape[0]+1, x.shape[1:])- Return type
Tensor
alf.utils.value_ops#
Various functions related to calculating values.
- action_importance_ratio(action_distribution, rollout_action_distribution, action, clipping_mode, scope, importance_ratio_clipping, log_prob_clipping, check_numerics, debug_summaries, rollout_log_prob=None)[source]#
ratio for importance sampling, used in PPO loss and vtrace loss.
Caller has to save alf.summary.scope() and pass scope to this function.
- Parameters
action_distribution (nested td.distribution) – Distribution over actions under target policy.
rollout_action_distribution (nested td.distribution) – distribution over actions from behavior policy, used to sample actions for the rollout.
action (nested tensor) – possibly batched action tuple taken during rollout.
clipping_mode (str) –
mode for clipping the importance ratio:
’double_sided’: clips the range of importance ratio into
[1-importance_ratio_clipping, 1+importance_ratio_clipping], which is used by PPOLoss.’capping’: clips the range of importance ratio into
min(1+importance_ratio_clipping, importance_ratio), which is used by VTraceLoss, where c_bar or rho_bar = 1+importance_ratio_clipping.
scope (name scope manager) – returned by
alf.summary.scope(), set outside.importance_ratio_clipping (float) – Epsilon in clipped, surrogate PPO objective. See the cited paper for more detail.
log_prob_clipping (float) – If >0, clipping log probs to the range (-log_prob_clipping, log_prob_clipping) to prevent inf / NaN values.
check_numerics (bool) – If true, adds checks to help find
NaN/Infvalues. For debugging only.debug_summaries (bool) – If true, output summary metrics to tensorboard.
rollout_log_prob (nested tensor) – the log probability of the action
- Returns
importance_ratio (Tensor), importance_ratio_clipped (Tensor).
- discounted_return(rewards, values, step_types, discounts, time_major=True)[source]#
Computes discounted return for the first T-1 steps.
The difference between this function and the one tf_agents.utils.value_ops is that the accumulated_discounted_reward is replaced by value for is_last steps in this function.
\[Q_t = \sum_{t'=t}^T \gamma^{t'-t} * r_{t'} + \gamma^{T-t+1}*final\_value.\]Define abbreviations:
B: batch size representing number of trajectories
T: number of steps per trajectory
- Parameters
rewards (Tensor) – shape is [T, B] (or [T]) representing rewards.
values (Tensor) – shape is [T, B] (or [T]) when representing values, [T, B, n_quantiles] or [T, n_quantiles] when representing quantiles of value distributions.
step_types (Tensor) – shape is [T, B] (or [T]) representing step types.
discounts (Tensor) – shape is [T, B] (or [T]) representing discounts.
time_major (bool) – Whether input tensors are time major. False means input tensors have shape [B, T].
- Returns
A tensor with shape [T-1, B] (or [T-1]) representing the discounted returns. Shape is [B, T-1] when time_major is false.
- generalized_advantage_estimation(rewards, values, step_types, discounts, td_lambda=1.0, time_major=True)[source]#
Computes generalized advantage estimation (GAE) for the first T-1 steps.
For theory, see “High-Dimensional Continuous Control Using Generalized Advantage Estimation” by John Schulman, Philipp Moritz et al. See https://arxiv.org/abs/1506.02438 for full paper.
The difference between this function and the one tf_agents.utils.value_ops is that the accumulated_td is reset to 0 for is_last steps in this function.
Define abbreviations:
B: batch size representing number of trajectories
T: number of steps per trajectory
- Parameters
rewards (Tensor) – shape is [T, B] (or [T]) representing rewards.
values (Tensor) – shape is [T,B] (or [T]) representing values.
step_types (Tensor) – shape is [T,B] (or [T]) representing step types.
discounts (Tensor) – shape is [T, B] (or [T]) representing discounts.
td_lambda (float) – A scalar between [0, 1]. It’s used for variance reduction in temporal difference.
time_major (bool) – Whether input tensors are time major. False means input tensors have shape [B, T].
- Returns
A tensor with shape [T-1, B] representing advantages. Shape is [B, T-1] when time_major is false.
- one_step_discounted_return(rewards, values, step_types, discounts)[source]#
Calculate the one step discounted return for the first T-1 steps.
return = next_reward + next_discount * next_value if is not the last step; otherwise will set return = current_discount * current_value.
Note: Input tensors must be time major :param rewards: shape is [T, B] (or [T]) representing rewards. :type rewards: Tensor :param values: shape is [T, B] (or [T]) when representing values,
[T, B, n_quantiles] or [T, n_quantiles] when representing quantiles of value distributions.
- Parameters
step_types (Tensor) – shape is [T, B] (or [T]) representing step types.
discounts (Tensor) – shape is [T, B] (or [T]) representing discounts.
- Returns
A tensor with shape [T-1, B] (or [T-1]) representing the discounted returns.
alf.utils.video_recorder#
- class VideoRecorder(env, frame_max_width=2560, frames_per_sec=None, last_step_repeats=0, append_blank_frames=0, **kwargs)[source]#
Bases:
gym.wrappers.monitoring.video_recorder.VideoRecorderA video recorder that renders frames and encodes them into a video file. Besides rendering frames, it also supports plotting prediction info. Each algorithm is responsible for adding rendered Image instances in its pred info in order to be recorded here. See the docstring in
alf.summary.renderfor more details.- Parameters
env (Gym.env) –
frame_max_width (int) – the max width of a video frame. Scale if the original width is bigger than this.
frames_per_sec (fps) – if None, use fps from the env
last_step_repeats (int) – repeat such number of times for the last frame of each episode.
append_blank_frames (int) – If >0, will append such number of blank frames at the end of the episode in the rendered video file. A negative value has the same effects as 0 and no blank frames will be appended.
- cache_frame_and_pred_info(frame, pred_info=None)[source]#
Cache the input frame and pred_info for video generation later.
- Parameters
frame (np.array) – the environmental frame.
pred_info (None|nest) – prediction step info for displaying: any Image instance in the info nest will be recorded.
- capture_frame(pred_info=None, is_last_step=False)[source]#
Render
self.envand add the resulting frame to the video. Also plot Image instances extracted from prediction info ofpolicy_step.- Parameters
pred_info (None|nest) – prediction step info for displaying: any Image instance in the info nest will be recorded.
is_last_step (bool) – whether the current time step is the last step of the episode, either due to game over or time limits.
alf.utils.visualizer#
Various functions related to visualizations of networks etc.
- critic_network_visualizer(net, observation, action_upper_left, action_upper_right, action_lower_left, H=20, W=20, batch_size=None)[source]#
Generate a batched network response image within the rectangular range of actions (referred to as probing region) specified by
action_top_left,action_top_right,action_bottom_leftas shown below:action_upper_left—–>action_upper_right-
v |
action_lower_left—-action_lower_rightwhere
action_lower_rightis computed from the three provided points as the following because of the rectangular assumption:action_lower_right = (action_upper_right + action_lower_left - action_upper_left)Example usage:
# assume a case where the dimensionality of action is 4 # the action for the upper-left point of the probing region action_upper_left = torch.Tensor([1, -1, 0, 0]) # the action for the upper-right point of the probing region action_upper_right = torch.Tensor([1, 1, 0, 0]) # the action for the lower-left point of the probing region action_lower_left = torch.Tensor([-1, -1, 0, 0]) # define a network function def net_func(net_input): critics, _ = self._critic_networks( net_input) # [B, replicas * reward_dim] critics = critics.reshape( # [B, replicas, reward_dim] -1, self._num_critic_replicas, *self._reward_spec.shape) critics = critics.min(dim=1)[0] return critics img = critic_network_visualizer(net_func, inputs.observation, action_upper_left, action_upper_right, action_lower_left, 20, 20) # visualize the first response image in the batch data = img[0, ...].squeeze(0) data = data.cpu().numpy() import alf.summary.render as render val_img = render.render_heatmap(name="val_img", data=data)
- Parameters
net (Callable) – a callable that is called as``net((obsevation, actions))``
observation (Tensor) – [B, …]
action_upper_left (tensor) – tensor representing the upper-left point of the probing region, with the shape of [action_dim]
action_upper_right (tensor) – a tensor representing the upper-right point of the probing region, with the shape of [action_dim]
action_lower_left (tensor) – a tensor representing the lower-left point of the probing region, with the shape of [action_dim]
H (int) – number of samples to be used for creating visualization along the direction of
action_lower_left - action_upper_left.W (int) – number of samples to be used for creating visualization along the direction of
action_upper_right - action_upper_left. The total number of samples is H * W.batch_size (int) – the batch size of the input
observation. If None, will be inferred from the inputobservation.
- Returns
The network response image of the shape [B, K, H, W], where K denotes the dimensionality of the network output for the non-batch dimension.