alf.metrics#

alf.metrics.metric#

A few metrics. Code adapted from https://github.com/tensorflow/agents/blob/master/tf_agents/metrics/tf_metric.py

class StepMetric(name, dtype, prefix='Metrics')[source]#

Bases: torch.nn.modules.module.Module

Defines the interface for metrics.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

call(*args, **kwargs)[source]#

Accumulates statistics for the metric.

Parameters
  • *args

  • **kwargs – A mini-batch of inputs to the Metric.

forward(*args, **kwargs)[source]#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

gen_summaries(train_step=None, step_metrics=(), other_steps={})[source]#

Generates summaries against train_step and all step_metrics.

Parameters
  • train_step – (Optional) Step counter for training iterations. If None, no metric is generated against the global step.

  • step_metrics – (Optional) Iterable of step metrics to generate summaries against.

  • other_steps (Dict[str, int]) – A dictionary of steps to generate summaries against.

reset()[source]#

Resets the values being tracked by the metric.

result()[source]#

Computes and returns a final value for the metric.

training: bool#

alf.metrics.metrics#

A set of metrics. Converted to PyTorch from the TF version. https://github.com/tensorflow/agents/blob/master/tf_agents/metrics/tf_metrics.py

class AverageDiscountedReturnMetric(example_time_step, name='AverageDiscountedReturn', prefix='Metrics', dtype=torch.float32, discount=0.99, reward_transformer=None, buffer_size=10)[source]#

Bases: alf.metrics.metrics.AverageEpisodicAggregationMetric

Metric for computing the average discounted episodic return.

It is calculated according to the following formula:

\[egin{array}{ll} R &=\]
rac{1}{L} (r_1 + (1+gamma) r_2 + (1+gamma+gamma^2) r_3 + cdots)

&=

rac{1}{L}sum_{l=1}^L sum_{k=0}^{l-1} gamma^k r_l,

end{array}

where \(\gamma\) is the reward discount, and \(r_1\) denotes the reward due to the first action, which is received at the second time step. \(L\) equals to the episode length - 1.

Note that if the last step is not due to time limit, the discounted return calculated from the formula above is unbiased. If the last step is due to time limit, it is a biased estimate and its expectation is lower than the ground-truth (when rewards are non-negative).

Parameters
  • discount (float) – the discount factor for calculating the discounted return

  • reward_transformer (Callable) – if provided, will calculate the discounted return using the transformed reward. It will be called as transformed_reward = reward_transformer(original_reward).

  • reward_clip (tuple) – in the format (min, max), to optionally plot return based on clipped reward when environment isn’t clipping.

training: bool#
class AverageEnvInfoMetric(example_time_step, name='AverageEnvInfoMetric', prefix='Metrics', dtype=torch.float32, fields=None, buffer_size=10)[source]#

Bases: alf.metrics.metrics.AverageEpisodicAggregationMetric

Metric for computing average quantities contained in the environment info. An example of env info (which can be a nest) has to be provided when constructing an instance in order to initialize the accumulator and buffer with the same nested structure.

Parameters

fields (Optional[List[str]]) – a list of fields to include in the average env info metric. If None, all fields will be included.

training: bool#
class AverageEpisodeLengthMetric(example_time_step, name='AverageEpisodeLength', prefix='Metrics', dtype=torch.float32, buffer_size=10)[source]#

Bases: alf.metrics.metrics.AverageEpisodicAggregationMetric

Metric for computing the average episode length.

Args: name (str): prefix (str): a prefix indicating the category of the metric dtype (torch.dtype): dtype of metric values. Should be floating types

in order to be averaged.

buffer_size (int): number of episodes the metric value will be averaged

across

example_time_step (nest): an example of the time step where the metric

values are extracted from. If None, a zero scalar is used as the example metric value.

training: bool#
class AverageEpisodicAggregationMetric(name='AverageEpisodicAggregationMetric', prefix='Metrics', dtype=torch.float32, buffer_size=10, example_time_step=None)[source]#

Bases: alf.metrics.metric.StepMetric

A base metric to aggregate quantities over an episode. It supports accumulating a nest of scalar values.

NOTE: normally this class and its sub-classes report metrics by summing values over the whole episode. However, there are two special treatments: 1. if _extract_metric_values() returns a nested structure in which a

dictionary or namedtuple has a field with postfix “@step”, the corresponding value will be averaged instead of summed over the whole episode length, so that a per-step average value is reported.

  1. If a field has a postfix “@max”, then the aggregated value will be the maximum (instead of sum) of step values across the episode.

This class supports partial aggregation, where if at any step the extracted metric value is not finite (inf or nan), then that step’s value will be skipped for aggregation. If a field is skipped for an entire episode, its accumulated value won’t be pushed into the metric buffer.

Parameters
  • name (str) –

  • prefix (str) – a prefix indicating the category of the metric

  • dtype (torch.dtype) – dtype of metric values. Should be floating types in order to be averaged.

  • buffer_size (int) – number of episodes the metric value will be averaged across

  • example_time_step (nest) – an example of the time step where the metric values are extracted from. If None, a zero scalar is used as the example metric value.

call(time_step)[source]#

Accumulate values from the time step. The values are defined by subclasses’ _extract_metric_values(). It will ignore the values of first time steps.

Parameters

time_step (alf.data_structures.TimeStep) – batched tensor

Returns

The arguments, for easy chaining.

latest()[source]#

Return the value added most recently.

reset()[source]#

Resets the values being tracked by the metric.

result()[source]#

Computes and returns a final value for the metric.

training: bool#
class AverageReturnMetric(example_time_step, name='AverageReturn', prefix='Metrics', dtype=torch.float32, buffer_size=10)[source]#

Bases: alf.metrics.metrics.AverageEpisodicAggregationMetric

Metric for computing the average return.

Args: name (str): prefix (str): a prefix indicating the category of the metric dtype (torch.dtype): dtype of metric values. Should be floating types

in order to be averaged.

buffer_size (int): number of episodes the metric value will be averaged

across

example_time_step (nest): an example of the time step where the metric

values are extracted from. If None, a zero scalar is used as the example metric value.

training: bool#
class AverageRewardMetric(example_time_step, name='AverageReward', prefix='Metrics', buffer_size=10)[source]#

Bases: alf.metrics.metrics.AverageDiscountedReturnMetric

Metric for computing the average reward per time step for each episode.

Args: discount (float): the discount factor for calculating the discounted

return

reward_transformer (Callable): if provided, will calculate the

discounted return using the transformed reward. It will be called as transformed_reward = reward_transformer(original_reward).

reward_clip (tuple): in the format (min, max), to optionally plot

return based on clipped reward when environment isn’t clipping.

training: bool#
class EnvironmentSteps(name='EnvironmentSteps', prefix='Metrics', dtype=torch.int64)[source]#

Bases: alf.metrics.metric.StepMetric

Counts the number of steps taken in the environment after FrameSkip.

If Frames are skipped by any of the environment wrappers, a separate metric AverageEnvInfoMetric[‘num_env_frames’] will report the actual frame count including skipped ones.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

call(time_step)[source]#

Increase the number of environment_steps according to time_step. Step count is not increased on time_step.is_first() since that step is not part of any episode.

Parameters

time_step (alf.data_structures.TimeStep) – batched tensor

Returns

The arguments, for easy chaining.

reset()[source]#

Resets the values being tracked by the metric.

result()[source]#

Computes and returns a final value for the metric.

training: bool#
class EpisodicStartAverageDiscountedReturnMetric(example_time_step, name='EpisodicStartAverageDiscountedReturn', prefix='Metrics', buffer_size=10, reward_transformer=None)[source]#

Bases: alf.metrics.metrics.AverageDiscountedReturnMetric

Metric for computing the discounted return from episode start states. It is calculated according to the following formula:

\[egin{array}{ll} R &=r_1 + \gamma r_2 + \gamma^2 r_3 + \cdots \ &= \sum_{l=1}^L \gamma^{l-1} r_l, \end{array}\]

where \(\gamma\) is the reward discount, and \(r_1\) denotes the reward due to the first action, which is received at the second time step. \(L\) equals to the episode length - 1.

Note that if the last step is not due to time limit, the discounted return calculated from the formula above is unbiased. If the last step is due to time limit, it is a biased estimate and its expectation is lower than the ground-truth (when rewards are non-negative).

Args: discount (float): the discount factor for calculating the discounted

return

reward_transformer (Callable): if provided, will calculate the

discounted return using the transformed reward. It will be called as transformed_reward = reward_transformer(original_reward).

reward_clip (tuple): in the format (min, max), to optionally plot

return based on clipped reward when environment isn’t clipping.

training: bool#
class MetricBuffer(max_len, dtype)[source]#

Bases: torch.nn.modules.module.Module

A metric buffer for computing average metric values. The buffer is assumed to store only scalar values.

Parameters
  • max_len (int) – maximum length of the buffer

  • dtype (torch.dtype) – dtype of the content of the buffer

append(value)[source]#

Append multiple values to the buffer.

Parameters

value (Tensor) – a batch of scalars with the shape \([B]\).

clear()[source]#
latest()[source]#

Return the value added most recently.

mean()[source]#
training: bool#
class NumberOfEpisodes(name='NumberOfEpisodes', prefix='Metrics', dtype=torch.int64)[source]#

Bases: alf.metrics.metric.StepMetric

Counts the number of episodes in the environment.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

call(time_step)[source]#

Increase the number of number_episodes according to time_step. It would increase for all time_step.is_last().

Parameters

time_step (alf.data_structures.TimeStep) – batched tensor

Returns

The arguments, for easy chaining.

reset()[source]#

Resets the values being tracked by the metric.

result()[source]#

Computes and returns a final value for the metric.

training: bool#