alf.trainers#

alf.trainers.evaluator#

class EvalJob(type, global_counter, step_metrics, state_dict)#

Bases: tuple

Create new instance of EvalJob(type, global_counter, step_metrics, state_dict)

global_counter#: Alias for field number 1

state_dict#: Alias for field number 3

step_metrics#: Alias for field number 2

type#: Alias for field number 0

class Evaluator(config, conf_file)[source]#

Bases: object

Evaluator for performing evaluation on the current algorithm.

If config.async_eval is True, the evaluation is performed asynchronously in a different process.

For each round of evaluation, it will play config.num_eval_episodes using config.num_eval_environments parallel environments.

Parameters

config (TrainerConfig) – the training config
conf_file (str) – path of the config file

close()[source]#

eval(algorithm, step_metric_values)[source]#

Do one round of evaluation.

If config.async_eval is True, this function will return once the evaluator worker makes a copy of the state_dict of algorithm. However, if the previous evaluation has not been finished, it will wait until it is finished.

The evaluation result will be written to log file and tensorboard by the evaluation worker.

Parameters

algorithm (RLAlgorithm) – the training algorithm
step_metric_values (Dict[str, int]) – a dictionary of step metric values to generate the evaluation summaries against. Note that it needs to contain “EnvironmentSteps” at least.

class SyncEvaluator(env, config)[source]#

Bases: object

Evaluator for performing evaluation on the current algorithm.

For each round of evaluation, it will play config.num_eval_episodes using config.num_eval_environments parallel environments.

eval(algorithm, step_metric_values)[source]#

Do one round of evaluation.

This function will return after finishing the evaluation.

The evaluation result will be written to log file and tensorboard by the evaluation worker.

Parameters

algorithm (RLAlgorithm) – the training algorithm
step_metric_values (Dict[str, int]) – a dictionary of step metric values to generate the evaluation summaries against. Note that it needs to contain “EnvironmentSteps” at least.

alf.trainers.policy_trainer#

Trainer for training an Algorithm on given environments.

class RLTrainer(config, ddp_rank=- 1)[source]#

Bases: alf.trainers.policy_trainer.Trainer

Trainer for reinforcement learning.

Parameters

config (TrainerConfig) – configuration used to construct this trainer
ddp_rank (int) – process (and also device) ID of the process, if the process participates in a DDP process group to run distributed data parallel training. A value of -1 indicates regular single process training.

class SLTrainer(config)[source]#

Bases: alf.trainers.policy_trainer.Trainer

Trainer for supervised learning.

Create a SLTrainer

Parameters: config (TrainerConfig) – configuration used to construct this trainer

class Trainer(config, ddp_rank=- 1)[source]#

Bases: object

Base class for trainers.

Trainer is responsible for creating algorithm and dataset/environment, setting up summary, checkpointing, running training iterations, and evaluating periodically.

Parameters

config (TrainerConfig) – configuration used to construct this trainer
ddp_rank (int) – process (and also device) ID of the process, if the process participates in a DDP process group to run distributed data parallel training. A value of -1 indicates regular single process training.

static current_env_steps()[source]#

static current_iterations()[source]#

static progress()[source]#

A static method that returns the current training progress, provided that only one trainer will be used for training.

Returns: a number in \([0,1]\) indicating the training progress.
Return type: float

train()[source]#: Perform training.