alf.trainers#
alf.trainers.evaluator#
- class EvalJob(type, global_counter, step_metrics, state_dict)#
Bases:
tupleCreate new instance of EvalJob(type, global_counter, step_metrics, state_dict)
- global_counter#
Alias for field number 1
- state_dict#
Alias for field number 3
- step_metrics#
Alias for field number 2
- type#
Alias for field number 0
- class Evaluator(config, conf_file)[source]#
Bases:
objectEvaluator for performing evaluation on the current algorithm.
If
config.async_evalis True, the evaluation is performed asynchronously in a different process.For each round of evaluation, it will play
config.num_eval_episodesusingconfig.num_eval_environmentsparallel environments.- Parameters
config (
TrainerConfig) – the training configconf_file (
str) – path of the config file
- eval(algorithm, step_metric_values)[source]#
Do one round of evaluation.
If
config.async_evalis True, this function will return once the evaluator worker makes a copy of the state_dict ofalgorithm. However, if the previous evaluation has not been finished, it will wait until it is finished.The evaluation result will be written to log file and tensorboard by the evaluation worker.
- Parameters
algorithm (
RLAlgorithm) – the training algorithmstep_metric_values (
Dict[str,int]) – a dictionary of step metric values to generate the evaluation summaries against. Note that it needs to contain “EnvironmentSteps” at least.
- class SyncEvaluator(env, config)[source]#
Bases:
objectEvaluator for performing evaluation on the current algorithm.
For each round of evaluation, it will play
config.num_eval_episodesusingconfig.num_eval_environmentsparallel environments.- eval(algorithm, step_metric_values)[source]#
Do one round of evaluation.
This function will return after finishing the evaluation.
The evaluation result will be written to log file and tensorboard by the evaluation worker.
- Parameters
algorithm (
RLAlgorithm) – the training algorithmstep_metric_values (
Dict[str,int]) – a dictionary of step metric values to generate the evaluation summaries against. Note that it needs to contain “EnvironmentSteps” at least.
alf.trainers.policy_trainer#
Trainer for training an Algorithm on given environments.
- class RLTrainer(config, ddp_rank=- 1)[source]#
Bases:
alf.trainers.policy_trainer.TrainerTrainer for reinforcement learning.
- Parameters
config (TrainerConfig) – configuration used to construct this trainer
ddp_rank (int) – process (and also device) ID of the process, if the process participates in a DDP process group to run distributed data parallel training. A value of -1 indicates regular single process training.
- class SLTrainer(config)[source]#
Bases:
alf.trainers.policy_trainer.TrainerTrainer for supervised learning.
Create a SLTrainer
- Parameters
config (TrainerConfig) – configuration used to construct this trainer
- class Trainer(config, ddp_rank=- 1)[source]#
Bases:
objectBase class for trainers.
Trainer is responsible for creating algorithm and dataset/environment, setting up summary, checkpointing, running training iterations, and evaluating periodically.
- Parameters
config (
TrainerConfig) – configuration used to construct this trainerddp_rank (
int) – process (and also device) ID of the process, if the process participates in a DDP process group to run distributed data parallel training. A value of -1 indicates regular single process training.