alf.metrics#
alf.metrics.metric#
A few metrics. Code adapted from https://github.com/tensorflow/agents/blob/master/tf_agents/metrics/tf_metric.py
- class StepMetric(name, dtype, prefix='Metrics')[source]#
Bases:
torch.nn.modules.module.ModuleDefines the interface for metrics.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- call(*args, **kwargs)[source]#
Accumulates statistics for the metric.
- Parameters
*args –
**kwargs – A mini-batch of inputs to the Metric.
- forward(*args, **kwargs)[source]#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- gen_summaries(train_step=None, step_metrics=(), other_steps={})[source]#
Generates summaries against train_step and all step_metrics.
- Parameters
train_step – (Optional) Step counter for training iterations. If None, no metric is generated against the global step.
step_metrics – (Optional) Iterable of step metrics to generate summaries against.
other_steps (
Dict[str,int]) – A dictionary of steps to generate summaries against.
- training: bool#
alf.metrics.metrics#
A set of metrics. Converted to PyTorch from the TF version. https://github.com/tensorflow/agents/blob/master/tf_agents/metrics/tf_metrics.py
- class AverageDiscountedReturnMetric(example_time_step, name='AverageDiscountedReturn', prefix='Metrics', dtype=torch.float32, discount=0.99, reward_transformer=None, buffer_size=10)[source]#
Bases:
alf.metrics.metrics.AverageEpisodicAggregationMetric- Metric for computing the average discounted episodic return.
It is calculated according to the following formula:
\[egin{array}{ll} R &=\]- rac{1}{L} (r_1 + (1+gamma) r_2 + (1+gamma+gamma^2) r_3 + cdots)
&=
- rac{1}{L}sum_{l=1}^L sum_{k=0}^{l-1} gamma^k r_l,
end{array}
where \(\gamma\) is the reward discount, and \(r_1\) denotes the reward due to the first action, which is received at the second time step. \(L\) equals to the episode length - 1.
Note that if the last step is not due to time limit, the discounted return calculated from the formula above is unbiased. If the last step is due to time limit, it is a biased estimate and its expectation is lower than the ground-truth (when rewards are non-negative).
- Parameters
discount (float) – the discount factor for calculating the discounted return
reward_transformer (Callable) – if provided, will calculate the discounted return using the transformed reward. It will be called as
transformed_reward = reward_transformer(original_reward).reward_clip (tuple) – in the format (min, max), to optionally plot return based on clipped reward when environment isn’t clipping.
- training: bool#
- class AverageEnvInfoMetric(example_time_step, name='AverageEnvInfoMetric', prefix='Metrics', dtype=torch.float32, fields=None, buffer_size=10)[source]#
Bases:
alf.metrics.metrics.AverageEpisodicAggregationMetricMetric for computing average quantities contained in the environment info. An example of env info (which can be a nest) has to be provided when constructing an instance in order to initialize the accumulator and buffer with the same nested structure.
- Parameters
fields (
Optional[List[str]]) – a list of fields to include in the average env info metric. If None, all fields will be included.
- training: bool#
- class AverageEpisodeLengthMetric(example_time_step, name='AverageEpisodeLength', prefix='Metrics', dtype=torch.float32, buffer_size=10)[source]#
Bases:
alf.metrics.metrics.AverageEpisodicAggregationMetricMetric for computing the average episode length.
Args: name (str): prefix (str): a prefix indicating the category of the metric dtype (torch.dtype): dtype of metric values. Should be floating types
in order to be averaged.
- buffer_size (int): number of episodes the metric value will be averaged
across
- example_time_step (nest): an example of the time step where the metric
values are extracted from. If
None, a zero scalar is used as the example metric value.
- training: bool#
- class AverageEpisodicAggregationMetric(name='AverageEpisodicAggregationMetric', prefix='Metrics', dtype=torch.float32, buffer_size=10, example_time_step=None)[source]#
Bases:
alf.metrics.metric.StepMetricA base metric to aggregate quantities over an episode. It supports accumulating a nest of scalar values.
NOTE: normally this class and its sub-classes report metrics by summing values over the whole episode. However, there are two special treatments: 1. if
_extract_metric_values()returns a nested structure in which adictionary or namedtuple has a field with postfix “@step”, the corresponding value will be averaged instead of summed over the whole episode length, so that a per-step average value is reported.
If a field has a postfix “@max”, then the aggregated value will be the maximum (instead of sum) of step values across the episode.
This class supports partial aggregation, where if at any step the extracted metric value is not finite (inf or nan), then that step’s value will be skipped for aggregation. If a field is skipped for an entire episode, its accumulated value won’t be pushed into the metric buffer.
- Parameters
name (str) –
prefix (str) – a prefix indicating the category of the metric
dtype (torch.dtype) – dtype of metric values. Should be floating types in order to be averaged.
buffer_size (int) – number of episodes the metric value will be averaged across
example_time_step (nest) – an example of the time step where the metric values are extracted from. If
None, a zero scalar is used as the example metric value.
- call(time_step)[source]#
Accumulate values from the time step. The values are defined by subclasses’
_extract_metric_values(). It will ignore the values of first time steps.- Parameters
time_step (alf.data_structures.TimeStep) – batched tensor
- Returns
The arguments, for easy chaining.
- training: bool#
- class AverageReturnMetric(example_time_step, name='AverageReturn', prefix='Metrics', dtype=torch.float32, buffer_size=10)[source]#
Bases:
alf.metrics.metrics.AverageEpisodicAggregationMetricMetric for computing the average return.
Args: name (str): prefix (str): a prefix indicating the category of the metric dtype (torch.dtype): dtype of metric values. Should be floating types
in order to be averaged.
- buffer_size (int): number of episodes the metric value will be averaged
across
- example_time_step (nest): an example of the time step where the metric
values are extracted from. If
None, a zero scalar is used as the example metric value.
- training: bool#
- class AverageRewardMetric(example_time_step, name='AverageReward', prefix='Metrics', buffer_size=10)[source]#
Bases:
alf.metrics.metrics.AverageDiscountedReturnMetricMetric for computing the average reward per time step for each episode.
Args: discount (float): the discount factor for calculating the discounted
return
- reward_transformer (Callable): if provided, will calculate the
discounted return using the transformed reward. It will be called as
transformed_reward = reward_transformer(original_reward).- reward_clip (tuple): in the format (min, max), to optionally plot
return based on clipped reward when environment isn’t clipping.
- training: bool#
- class EnvironmentSteps(name='EnvironmentSteps', prefix='Metrics', dtype=torch.int64)[source]#
Bases:
alf.metrics.metric.StepMetricCounts the number of steps taken in the environment after FrameSkip.
If Frames are skipped by any of the environment wrappers, a separate metric AverageEnvInfoMetric[‘num_env_frames’] will report the actual frame count including skipped ones.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- call(time_step)[source]#
Increase the number of environment_steps according to
time_step. Step count is not increased ontime_step.is_first()since that step is not part of any episode.- Parameters
time_step (alf.data_structures.TimeStep) – batched tensor
- Returns
The arguments, for easy chaining.
- training: bool#
- class EpisodicStartAverageDiscountedReturnMetric(example_time_step, name='EpisodicStartAverageDiscountedReturn', prefix='Metrics', buffer_size=10, reward_transformer=None)[source]#
Bases:
alf.metrics.metrics.AverageDiscountedReturnMetricMetric for computing the discounted return from episode start states. It is calculated according to the following formula:
\[egin{array}{ll} R &=r_1 + \gamma r_2 + \gamma^2 r_3 + \cdots \ &= \sum_{l=1}^L \gamma^{l-1} r_l, \end{array}\]where \(\gamma\) is the reward discount, and \(r_1\) denotes the reward due to the first action, which is received at the second time step. \(L\) equals to the episode length - 1.
Note that if the last step is not due to time limit, the discounted return calculated from the formula above is unbiased. If the last step is due to time limit, it is a biased estimate and its expectation is lower than the ground-truth (when rewards are non-negative).
Args: discount (float): the discount factor for calculating the discounted
return
- reward_transformer (Callable): if provided, will calculate the
discounted return using the transformed reward. It will be called as
transformed_reward = reward_transformer(original_reward).- reward_clip (tuple): in the format (min, max), to optionally plot
return based on clipped reward when environment isn’t clipping.
- training: bool#
- class MetricBuffer(max_len, dtype)[source]#
Bases:
torch.nn.modules.module.ModuleA metric buffer for computing average metric values. The buffer is assumed to store only scalar values.
- Parameters
max_len (int) – maximum length of the buffer
dtype (torch.dtype) – dtype of the content of the buffer
- append(value)[source]#
Append multiple values to the buffer.
- Parameters
value (Tensor) – a batch of scalars with the shape \([B]\).
- training: bool#
- class NumberOfEpisodes(name='NumberOfEpisodes', prefix='Metrics', dtype=torch.int64)[source]#
Bases:
alf.metrics.metric.StepMetricCounts the number of episodes in the environment.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
- call(time_step)[source]#
Increase the number of number_episodes according to
time_step. It would increase for alltime_step.is_last().- Parameters
time_step (alf.data_structures.TimeStep) – batched tensor
- Returns
The arguments, for easy chaining.
- training: bool#