alf.bin#

alf.bin.compare#

Compare two algorithms on a set of fixed task initializations.

Run:

python3 -m alf.bin.compare \
--root_dir1=~/tmp/ac_cart_pole \
--root_dir2=~/tmp/ddpg_cart_pole \
--alsologtostderr

Prefix with ``DISPLAY= vglrun -d :7 `` if running remotely with virtual_gl. The cleared DISPLAY env_var is so that gzclients are not created. gzclients are not being torn down after play and can occupy too many xserver connections. Set the proper DISPLAY variable when recording video.

main(_)[source]#: main function.

alf.bin.grid_search#

Grid search.

To run grid search on DDPG for training gym Pendulum:

cd ${PROJECT}/alf/examples;
python -m alf.bin.grid_search \
--root_dir=~/tmp/ddpg_pendulum \
--search_config=ddpg_grid_search.json \
--gin_file=ddpg_pendulum.gin \
--gin_param='create_environment.num_parallel_environments=8' \
--alsologtostderr

For using ALF conf, replace “–gin_file” with “–conf” and “–gin_param” with “–conf_param”.

class GridSearch(conf_file)[source]#

Bases: object

Grid Search.

Parameters: conf_file (str) – Path to the config file.

run()[source]#: Run trainings with all possible parameter combinations in the configured space.

class GridSearchConfig(conf_file)[source]#

Bases: object

A grid search config file should be in the json format. For example:

{
    "desc": "desc text",
    "use_gpu": true,
    "gpus": [0, 1],
    "max_worker_num": 8,
    "repeats": 3,
    "parameters": {
        "ac/Adam.learning_rate": [1e-3, 8e-4],
        "OneStepTDLoss.gamma":"(0.995, 0.99)",
        "param_name3": param_value3,
        ...
    }
    ...
}

Supported keys in a json file are:

Parameters

desc (str) – a description sentence for this json file.
use_gpu (bool) – If True, then the scheduling will only put jobs on devices numbered gpus.
gpus (list[int]) – a list of GPU device ids. If use_gpu is False, this list will be ignored.
max_worker_num (int) – the max number of parallel worker processes at any moment. max_worker_num jobs will be evenly divided among the devices specified by the gpus list. It’s the user’s responsibility to make sure that each device’s resource is enough.
repeats (int) – each parameter combination will be repeated for so many times, with different random seeds.
parameters (dict) – a dict(param_name=param_value,) of the configured search space. Each key param_name is a gin/alf configurable argument string and the paired param_value must be an iterable python object or a str that can be evaluated to an iterable object. When parameters is empty, the original conf file won’t be changed.

See alf/examples/ddpg_grid_search.json for an example.

Parameters: conf_file (str) – Path to the config file.

property desc#

property gpus#

property max_worker_num#

property param_keys#

property param_values#

property repeats#

property use_gpu#

launch_snapshot_gridsearch()[source]#: This gridsearch function uses a cached ALF snapshot to generate grid-search runs. Because some search jobs might stay in the queue until resources are available, the cache is used to make sure that when a search job is launched, it’s actually using the right ALF version.

main(_)[source]#

search()[source]#

alf.bin.play#

Play a trained model.

You can visualize playing of the trained model by running:

cd ${PROJECT}/alf/examples;
python -m alf.bin.play \
--root_dir=~/tmp/cart_pole \
--alsologtostderr

launch_snapshot_play()[source]#

This play function uses historical ALF snapshot for playing a trained model, consistent with the code snapshot that trains the model.

In the newer version of train.py, a ALF snapshot is saved to root_dir right before the training begins. So this function prepends root_dir to PYTHONPATH to allow using the snapshot ALF repo in that place.

Note that for any old training root_dir prior to snapshot being enabled, this function doesn’t have any effect and the most up-to-date ALF will be used by play.

main(_)[source]#

play()[source]#

alf.bin.train#

Train model.

To run actor-critic on gym CartPole:

cd ${PROJECT}/alf/examples;
python -m alf.bin.train \
--root_dir=~/tmp/cart_pole \
--gin_file=ac_cart_pole.gin \
--gin_param='create_environment.num_parallel_environments=8' \
--alsologtostderr

You can view various training curves using Tensorboard by running the follwoing command in a different terminal:

tensorboard --logdir=~/tmp/cart_pole

You can visualize playing of the trained model by running:

cd ${PROJECT}/alf/examples;
python -m alf.bin.play \
--root_dir=~/tmp/cart_pole \
--gin_file=ac_cart_pole.gin \
--alsologtostderr

In case you have multiple GPUs on the machine and you would like to train with all of them, specify –distributed multi-gpu. This will use PyTorch’s DistributedDataParallel for training.

If instead of Gin configuration file, you want to use ALF python conf file, then replace the “–gin_file” option with “–conf”, and “–gin_param” with “–conf_param”.

main(_)[source]#

training_worker(rank, world_size, conf_file, root_dir, paras_queue=None)[source]#

An executable instance that trains and evaluate the algorithm

Parameters

rank (int) – The ID of the process among all of the DDP processes.
world_size (int) – The number of processes in total. If set to 1, it is interpreted as “non distributed mode”.
conf_file (str) – Path to the training configuration.
root_dir (str) – Path to the directory for writing logs/summaries/checkpoints.
paras_queue (Optional[Queue]) – a shared Queue for checking the consistency of model parameters in different worker processes, if multi-gpu training is used.

alf.bin.verify_checkpoint#

Utility to check whether checkpointed algorithm can be restored correctly.

It works as the following:

Save the config.
Train the algorithm for a few iterations.
Test the algorithm for a few steps and store the output of the algorithm and the environment time steps.
Save checkpoint.
Create the algorithm using the saved config.
Load checkpoint.
Run the algorithm using the stored time steps.
Compare the output from step 7 with the output from step 3. They should be exactly same.

The simplest way to use it is to invoke it in the following way:

python -m alf.bin.verify_checkpoint --conf [CONF_FILE_NAME]

You may want to set a different value of --num_train_iterations if your training does not start from beginning because of TrainerConfig.initial_collect_steps. You may also want to set a different value of --num_test_steps to test more steps.

main(_)[source]#