ALF snapshot and advanced play#
Recall that in the first chapter A minimal working example, we introduced
the concept of play as the process of evaluating a trained model on a task while
possibly doing some visualization. There we showed very basic usage of the
alf.bin.play module by loading a trained model and rendering the environment to
the screen or a video. Now we describe how to utilize ALF’s advanced play features
to evaluate or debug a model in depth.
ALF snapshot#
ALF calls alf.trainers.policy_trainer.play() to play a trained model. But
which code version does ALF use? By default, one expects to use the current
up-to-date code to do this. However, there are many circumstances when we don’t
want to do so. For example, we’d like to play a model that was trained a long time
ago, in one of the following scenarios:
the ALF repo is always a fixed version, but we tried out different ideas by constantly changing our own project code (e.g., algorithms) after training the model;
we wanted to use new features of ALF and pulled the lastest ALF after training the model;
as ALF developers, we updated the ALF repo constantly and it may be no longer compatible with the trained model.
These cases sound like not a big issue if we only use ALF once for all, but it does become disastrous if ALF is used for many projects repeatedly and the code version keeps evolving. Of course, version control systems like Git can help to some extent, but the key question is, how do we reliably establish a one-to-one mapping between a code version and a trained model? Git clearly is not the most satisfying answer to this.
To solve this issue, ALF uses a simple approach. Whenever lauching a training
(alf.bin.train) or grid searching (alf.bin.grid_search) job, ALF
takes a snapshot of the current repo and stores it in the job root dir. For example,
the following command
python -m alf.bin.train --root_dir /tmp/alf_tutorial1 --conf <ALF_ROOT>/alf/examples/tutorial/minimal_example_conf.py
will store an ALF snapshot at /tmp/alf_tutorial1/alf. The snapshot directory
has exactly the same structure with the ALF repo (i.e., it is a clone).
Note
ALF relies on rsync
to copy all *.py, *.gin, and *.json files of the ALF repo to
a snapshot. The size of a snapshot is roughly 15M which is acceptable given
modern disk storage capacity.
In this way, each trained model is accompanied with an ALF snapshot which is exactly the code version that trained that model. Even if we implement our own algorithms without touching ALF, our own code will get a snapshot if its path starts with the ALF root.
Note that in the case of continuing training an existing root_dir, a new snapshot
will be generated to overwrite the existing one.
python -m alf.bin.train --root_dir /tmp/alf_tutorial1
This will generate a new snapshot at /tmp/alf_tutorial1/alf.
When playing a trained model with snapshot enabled, we can add an option --use_alf_snapshot
to use the archived ALF version instead of the current one:
python -m alf.bin.play --root_dir /tmp/alf_tutorial1 --use_alf_snapshot
You’ll notice some message like
I0914 11:49:21.344848 140367115069248 play.py:168] === Using an ALF snapshot at '/tmp/alf_tutorial1/alf' ===
which indicates that you’re indeed using a snapshot version.
Note
It’s possible to turn off the snapshot feature. When training or grid searching,
appending the option --nostore_snapshot will do so.
The special case of config file#
The config file of a training job will also be archived in a snapshot, if its
path starts with ALF root. However, ALF also archives another config copy
directly under the job directory named as alf_config.py. This special config
file also records command-line config parameters and grid search parameters, so
it’s generally more accurate than the snapshot version.
In the above training example, if we add a command-line config parameter:
rm -rf /tmp/alf_tutorial1
python -m alf.bin.train --root_dir /tmp/alf_tutorial1 --conf <ALF_ROOT>/alf/examples/tutorial/minimal_example_conf.py --conf_param="TrainerConfig.summary_interval=100"
Again, ALF will store a snapshot at /tmp/alf_tutorial1/alf and we can get
the config file at
/tmp/alf_tutorial1/alf/examples/tutorial/minimal_example_conf.py
However, this is just a copy of the original config file: it doesn’t record
our command-line parameter TrainerConfig.summary_interval=100.
In contrast, if we look at /tmp/alf_tutorial1/alf_config.py, we’ll see something
like
########### pre-configs ###########
import alf
alf.pre_config({
'TrainerConfig.summary_interval': 100,
})
########### end pre-configs ###########
on the very top of the file.
Regardless of whether having the flag --use_alf_snapshot when playing a model,
ALF will always use alf_config.py. So if we’d like to make changes to the
config file for play, we need to modify alf_config.py in either case.
For other changes to make for play, we need to modify the snapshot code if
--use_alf_snapshot is provided, and modify the current ALF repo otherwise.
Advanced play by rendering#
Besides playing with a snapshot, another advanced play case is to utilize the
alf.summary.render module. This module contains several helper functions
that convert arrays and tensors to Image objects for visualization
on screen or in a video. Recall that ALF’s play calls an algorithm’s
predict_step() to do online action prediction given
a trained model (see algorithm interfaces). The AlgStep
output of a predict_step() call will be used for environment interaction and
retrieve the next environment frame. At every step of this prediction loop, any
Image object, once put into info of an AlgStep returned
by an algorithm’s predict_step(), will be concatenated to the corresponding
environment frame by the side. Below we’ll walk through an example to show how to
use alf.summary.render. The complete code is located at alf.examples.tutorial.ac_render_conf.
We will again train a model on the “CartPole-v0” task. So first of all, we activate
all the configurations of alf.examples.ac_cart_pole_conf by importing it:
from alf.examples import ac_cart_pole_conf
And import the render module
import alf.summary.render as render
Then to tell the play module what to render, we overwrite
predict_step() of the original AC algorithm:
class ACRenderAlgorithm(ActorCriticAlgorithm):
def predict_step(self, inputs, state):
alg_step = super().predict_step(inputs, state)
action = alg_step.output
action_dist = alg_step.info.action_distribution
with alf.summary.scope("ACRender"):
# Render an action image of type ``render.Image``.
action_img = render.render_action(
name="predicted_action",
action=action,
action_spec=self._action_spec)
# Render an action distribution image of type ``render.Image``.
action_dist_img = render.render_action_distribution(
name="predicted_action_distribution",
act_dist=action_dist,
action_spec=self._action_spec)
# Put the two ``Image`` objects into ``info``. Any nest structure is
# acceptable for the new ``info``. ALF's play will look for ``Image``
# objects.
return alg_step._replace(
info=dict(action_img=action_img,
action_dist_img=action_dist_img,
ac=alg_step.info))
Basically, what we’d like to do is taking the predicted action and action distribution
from the AlgStep of ActorCriticAlgorithm, and call render.render_action()
and render.render_action_distribution() to obtain two Image
objects. The final step is to make sure to put the objects in the info field of
the returned AlgStep. It doesn’t matter how we organize the two objects in info:
as long as they are in it, play will find and display them.
Note that we created a namescope of “ACRender” when calling the rendering functions. This namescope usage is exactly the same with the namescope for summary functions: it will prefix all rendered image names with “ACRender/”. These image names will be displayed as labels in the final video.
Finally, we tell ALF to use our newly defined algorithm:
alf.config(
"TrainerConfig",
algorithm_ctor=partial(
ACRenderAlgorithm, optimizer=alf.optimizers.Adam(lr=1e-3)))
Now let’s train and play this conf file:
python -m alf.bin.train --root_dir /tmp/ac_render --conf <ALF_ROOT>/alf/examples/tutorial/ac_render_conf.py
# After several minutes the above training command will finish.
# Once finished, Run the following command to play the trained model.
python -m alf.bin.play --root_dir /tmp/ac_render --num_episodes 1 --record_file /tmp/tmp.mp4 --alg_render
Note that when playing, we need to add the flag --alg_render to turn on the
render module; otherwise the rendering functions will not
be called. If we open “tmp.mp4”, the video frame will look like:
Basically, along with every environment frame, the action taken at that frame will also be displayed.
render contains other rendering functions (e.g., heatmap,
curve, etc), and we suggest the reader to take a look at its API doc. Some example
rendered frames are:
Note
Currently with --alg_render the rendering speed will be slow (less than
10 FPS, depending on how many plots each frame has). This
inefficiency is largely due to Matplotlib.
Summary#
In this chapter we explained what ALF snapshot is, why we need it, and how to use it for playing a model. We also talked about how to customize rendering during play to visualize various prediction statistics. These two advanced play use cases enable us to better evaluate and analyze trained models.