A minimal working example#
We start with a minimal working example of ALF. The example, as a pure ALF
configuration file, is alf.examples.tutorial.minimal_example_conf,
and consists of only 8 lines.
Let’s ignore its content for a moment (see the next chapter Understanding ALF via the minimal working example for an explanation of the configuration content), and just focus on how to launch the training, interpret the output training messages, and evaluate a trained model.
Train from scratch#
We can train from scratch by
cd <ALF_ROOT>/alf/examples/tutorial
python -m alf.bin.train --root_dir /tmp/alf_tutorial1 --conf minimal_example_conf.py
assuming /tmp/alf_tutorial1 doesn’t exist or is empty.
Note
In ALF, a root directory is where all the Tensorboard summary events, algorithm checkpoints, general training information (e.g., conf file, code diff, output log, etc) are stored.
The training will finish in several seconds, but with some informative messages
shown in the terminal. First of all, you should see a message from checkpoint_utils
like
There is no checkpoint in directory /tmp/alf_tutorial1/train/algorithm. Train
from scratch
which basically confirms that the training is from scratch and all algorithm parameters
and states are randomly initialized. Also policy_trainer will output
message lines like
minimal_example_conf.py -> alf_tutorial1: 0 time=0.046 throughput=5169.30
which has the formatting template as
[conf_file_name] -> [training_root_dir]: [iteration_numer] time=[current_time_per_training_iter] throughput=[current_training_throughput]
Finally, you will see
Checkpoint 'ckpt-1' is saved successfully.
as the training finishes. Here we have the checkpoint numbered by the training iteration, which is ‘1’ because only one iteration is performed by this example.
Train from a checkpoint#
By launching the same command again, this time the checkpoint messages are different. First it should say
Checkpoint 'ckpt-1' is loaded successfully.
which means the training is no longer from scratch, but instead reads the saved
checkpoint from the last run. By default ALF reads the most recent checkpoint in
a training root dir if multiple checkpoints exist. Also at the end of training,
checkpoint_utils outputs:
Checkpoint 'ckpt-2' is saved successfully.
It’s important to understand that when training from a checkpoint, the numbering will continue, i.e., the previous checkpoints won’t be overwritten.
Use Tensorboard for monitoring the training progress#
While the training is ongoing, we can monitor the real-time progress by
tensorboard --logdir /tmp/alf_tutorial1
We leave the interpretation of various Tensorboard statistics to a later chapter Summary, metrics, and Tensorboard.
Play from a checkpoint#
ALF defines the term play as evaluating a model on a task and possibly also visualizing the evaluation process, for example, by rendering environment frames or various inference statistics.
Here we only introduce three basic use cases of the ALF play module. For advanced
play (e.g., rendering customized model inference results, play from an ALF snapshot,
headless rendering, etc), we refer the reader to ALF snapshot and advanced play.
To play the trained model while rendering the environment on the screen,
python -m alf.bin.play --root_dir /tmp/alf_tutorial1
By default, play will choose the most recent checkpoint for evaluation. If you don’t want to render, but just play to evaluate:
python -m alf.bin.play --root_dir /tmp/alf_tutorial1 --norender
Or you can save the rendered result to a mp4 video file:
python -m alf.bin.play --root_dir /tmp/alf_tutorial1 --record_file /tmp/alf_tutorial1.mp4
We recommend the reader to read the various commandline flags in play,
for specifying different options such as checkpoint number and number of episodes to
evaluate.
Summary#
So far, we’ve talked about how to train a conf file and play the trained model,
with very basic options of train and play.py. This covers a usual
command-line usage of ALF. We really haven’t explained the content of the
example and the ALF pipeline yet. In the next chapter, we will try to get a
rough picture of ALF through the lens of this minimal working example.