alf.bin#
alf.bin.compare#
Compare two algorithms on a set of fixed task initializations.
Run:
python3 -m alf.bin.compare \
--root_dir1=~/tmp/ac_cart_pole \
--root_dir2=~/tmp/ddpg_cart_pole \
--alsologtostderr
Prefix with ``DISPLAY= vglrun -d :7 `` if running remotely with virtual_gl. The cleared DISPLAY env_var is so that gzclients are not created. gzclients are not being torn down after play and can occupy too many xserver connections. Set the proper DISPLAY variable when recording video.
alf.bin.grid_search#
Grid search.
To run grid search on DDPG for training gym Pendulum:
cd ${PROJECT}/alf/examples;
python -m alf.bin.grid_search \
--root_dir=~/tmp/ddpg_pendulum \
--search_config=ddpg_grid_search.json \
--gin_file=ddpg_pendulum.gin \
--gin_param='create_environment.num_parallel_environments=8' \
--alsologtostderr
For using ALF conf, replace “–gin_file” with “–conf” and “–gin_param” with “–conf_param”.
- class GridSearch(conf_file)[source]#
Bases:
objectGrid Search.
- Parameters
conf_file (str) – Path to the config file.
- class GridSearchConfig(conf_file)[source]#
Bases:
objectA grid search config file should be in the json format. For example:
{ "desc": "desc text", "use_gpu": true, "gpus": [0, 1], "max_worker_num": 8, "repeats": 3, "parameters": { "ac/Adam.learning_rate": [1e-3, 8e-4], "OneStepTDLoss.gamma":"(0.995, 0.99)", "param_name3": param_value3, ... } ... }
Supported keys in a json file are:
- Parameters
desc (str) – a description sentence for this json file.
use_gpu (bool) – If True, then the scheduling will only put jobs on devices numbered
gpus.gpus (list[int]) – a list of GPU device ids. If
use_gpuis False, this list will be ignored.max_worker_num (int) – the max number of parallel worker processes at any moment.
max_worker_numjobs will be evenly divided among the devices specified by thegpuslist. It’s the user’s responsibility to make sure that each device’s resource is enough.repeats (int) – each parameter combination will be repeated for so many times, with different random seeds.
parameters (dict) – a
dict(param_name=param_value,)of the configured search space. Each keyparam_nameis a gin/alf configurable argument string and the pairedparam_valuemust be an iterable python object or astrthat can be evaluated to an iterable object. Whenparametersis empty, the original conf file won’t be changed.
See
alf/examples/ddpg_grid_search.jsonfor an example.- Parameters
conf_file (str) – Path to the config file.
- property desc#
- property gpus#
- property max_worker_num#
- property param_keys#
- property param_values#
- property repeats#
- property use_gpu#
- launch_snapshot_gridsearch()[source]#
This gridsearch function uses a cached ALF snapshot to generate grid-search runs. Because some search jobs might stay in the queue until resources are available, the cache is used to make sure that when a search job is launched, it’s actually using the right ALF version.
alf.bin.play#
Play a trained model.
You can visualize playing of the trained model by running:
cd ${PROJECT}/alf/examples;
python -m alf.bin.play \
--root_dir=~/tmp/cart_pole \
--alsologtostderr
- launch_snapshot_play()[source]#
This play function uses historical ALF snapshot for playing a trained model, consistent with the code snapshot that trains the model.
In the newer version of
train.py, a ALF snapshot is saved toroot_dirright before the training begins. So this function prependsroot_dirtoPYTHONPATHto allow using the snapshot ALF repo in that place.Note that for any old training
root_dirprior to snapshot being enabled, this function doesn’t have any effect and the most up-to-date ALF will be used by play.
alf.bin.train#
Train model.
To run actor-critic on gym CartPole:
cd ${PROJECT}/alf/examples;
python -m alf.bin.train \
--root_dir=~/tmp/cart_pole \
--gin_file=ac_cart_pole.gin \
--gin_param='create_environment.num_parallel_environments=8' \
--alsologtostderr
You can view various training curves using Tensorboard by running the follwoing command in a different terminal:
tensorboard --logdir=~/tmp/cart_pole
You can visualize playing of the trained model by running:
cd ${PROJECT}/alf/examples;
python -m alf.bin.play \
--root_dir=~/tmp/cart_pole \
--gin_file=ac_cart_pole.gin \
--alsologtostderr
In case you have multiple GPUs on the machine and you would like to train with all of them, specify –distributed multi-gpu. This will use PyTorch’s DistributedDataParallel for training.
If instead of Gin configuration file, you want to use ALF python conf file, then replace the “–gin_file” option with “–conf”, and “–gin_param” with “–conf_param”.
- training_worker(rank, world_size, conf_file, root_dir, paras_queue=None)[source]#
An executable instance that trains and evaluate the algorithm
- Parameters
rank (int) – The ID of the process among all of the DDP processes.
world_size (int) – The number of processes in total. If set to 1, it is interpreted as “non distributed mode”.
conf_file (str) – Path to the training configuration.
root_dir (str) – Path to the directory for writing logs/summaries/checkpoints.
paras_queue (
Optional[Queue]) – a shared Queue for checking the consistency of model parameters in different worker processes, if multi-gpu training is used.
alf.bin.verify_checkpoint#
Utility to check whether checkpointed algorithm can be restored correctly.
It works as the following:
Save the config.
Train the algorithm for a few iterations.
Test the algorithm for a few steps and store the output of the algorithm and the environment time steps.
Save checkpoint.
Create the algorithm using the saved config.
Load checkpoint.
Run the algorithm using the stored time steps.
Compare the output from step 7 with the output from step 3. They should be exactly same.
The simplest way to use it is to invoke it in the following way:
python -m alf.bin.verify_checkpoint --conf [CONF_FILE_NAME]
You may want to set a different value of --num_train_iterations if your training
does not start from beginning because of TrainerConfig.initial_collect_steps.
You may also want to set a different value of --num_test_steps to test more steps.