alf.environments#

alf.environments.alf_environment#

ALF RL Environment API.

Adapted from TF-Agents Environment API as seen in:: https://github.com/tensorflow/agents/blob/master/tf_agents/environments/py_environment.py https://github.com/tensorflow/agents/blob/master/tf_agents/environments/tf_environment.py

class AlfEnvironment[source]#

Bases: object

Abstract base class for ALF RL environments.

Observations and valid actions are described with TensorSpec, defined in the specs module.

The current_time_step() method returns current time_step, resetting the environment if necessary.

The step(action) method applies the action and returns the new time_step. This method will also reset the environment if needed and ignore the action in that case.

The reset() method returns time_step that results from an environment reset and is guaranteed to have step_type=ts.FIRST.

The reset() method is only needed for explicit resets. In general, the environment will reset automatically when needed, for example, when no episode was started or when it reaches a step after the end of the episode (i.e. step_type=ts.LAST).

If the environment can run multiple steps at the same time and take a batched set of actions and return a batched set of observations, it should overwrite the property batched to True.

Example for collecting an episode:

env = AlfEnvironment()

# reset() creates the initial time_step and resets the environment.
time_step = env.reset()
while not time_step.is_last():
    action_step = policy.action(time_step)
    time_step = env.step(action_step.action)

abstract action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

current_time_step()[source]#: Returns the current timestep.

abstract env_info_spec()[source]#: Defines the env_info provided by the environment.

get_info()[source]#

Returns the environment info returned on the last step.

Returns: Info returned by last call to step(). None by default.
Raises: NotImplementedError – If the environment does not use info.

property num_tasks#: Number of tasks supported by this environment.

abstract observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reset()[source]#

Starts a new sequence and returns the first TimeStep of this sequence.

Note: Subclasses cannot override this directly. Subclasses implement _reset() which will be called by this method. The output of _reset() will be cached and made available through current_time_step().

Returns
Return type: TimeStep

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

step(action)[source]#

Updates the environment according to the action and returns a TimeStep.

If the environment returned a TimeStep with StepType.LAST at the previous step the implementation of _step in the environment should call reset to start a new sequence and ignore action.

This method will start a new sequence if called after the environment has been constructed and reset has not been called. In this case action will be ignored.

Note: Subclasses cannot override this directly. Subclasses implement _step() which will be called by this method. The output of _step() will be cached and made available through current_time_step().

Parameters: action (nested Tensor) – input actions.
Returns
Return type: TimeStep

property task_names#: The name of each tasks.

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

alf.environments.alf_gym3_wrapper#

Wrapper providing an AlfEnvironment adapter for Gym3 envrionments

Gym3 provides an unified interface for reinforcement leraning environments that improves upon the gym interface and includes vectorization (i.e. natively supported batched environments).

Gym3 has a different set of considerations which lead to different design choices compared to gym. See the following links to learn about those design choices.

https://github.com/openai/gym3/blob/master/docs/design.md

class AlfGym3Wrapper(gym3_env, image_channel_first=True, ignored_info_keys=[], support_force_reset=False, render_activator=None, frame_extractor=None)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

An adapter to make Gym3 environments follow Alf’s convention

Although Gym3 provides an official gym wrapper, we decided to not base the Alf adapter upon that gym wrapper because:

Performance and resource-wise, relying the natively supported batch (vectorized) environments from Gym3 is much more memory-efficient than creating a lot of Gym3 instances in subprocesses in batch mode.
Gym3 has a different interface on indicating the last step and first step of an episode compared to gym.
Gym3 has different interfaces to rendering and recording from gym.
Gym3 normally do not provide support for resetting the environment.

In this adapter, all above are considered and patched to achieve compatibility with AlfEnvironment.

Normally you are not expected to call AlfGym3Wrapper directly. Instead the load() functions for various Gym3-based environments are preferred.

For example, suite_procgen.load() is used to construct procgen environments which themselves are Gym3-based environments.

NOTE: TimeLimit is currently not applicable to Gym3 environments as it does not offer reset() interface.

Construct an adapted instance for the input Gym3 environment

Parameters

gym3_env (Env) – the input environment which should be an instance of a class that derives from gym3.Env
image_channel_first (bool) – when set to True, the image-based (of 3 channels) observation will be permuted so that the channel dimension comes first.
ignored_info_keys (List[str]) – a list of keys in the env info that should not be included in the env info of the TimeStep. This is useful when some huge but not useful information are stored in the env info of the underlying Gym3 environment, and ignoring them is crucial to achieve better performance.
support_force_reset (bool) – Gym3 environments do not support force reset in general. However, some of the environments such as procgen allows sending action -1 to reset the environments. Set this to True to enable such behavior.
render_activator (Optional[Callable[[], Env]]) – when set to None, it indicates that this environment does not support rendering. Otherwise it will be a function that re-creates a Gym3 environment with render enabled. See render() for details.
frame_extractor (Optional[Callable[[Env], Any]]) – when set to None, it indicates that this environment does not support recording. Otherwise it will be a function that extracts the rendered frame for recording from the environment.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode)[source]#

Enables rendering by re-activating the environment

Parameters: mode (str) – A string indicate the rendering mode. This is to make it compatible with Gym environments’ rendering interface. For AlfGym3Wrapper, it returns the RGB array image if mode is specified as rgb_array, and None for other modes.

alf.environments.alf_gym_wrapper#

Wrapper providing an AlfEnvironment adapter for GYM environments.

Adapted from TF-Agents Environment API as seen in:: https://github.com/tensorflow/agents/blob/master/tf_agents/environments/suite_gym.py

class AlfGymWrapper(gym_env, env_id=None, discount=1.0, auto_reset=True, simplify_box_bounds=True)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Base wrapper implementing AlfEnvironmentBaseWrapper interface for Gym envs.

Action and observation specs are automatically generated from the action and observation spaces. See base class for AlfEnvironment details.

Parameters

gym_env (gym.Env) – An instance of OpenAI gym environment.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
auto_reset (bool) – whether or not to reset the environment when done.
simplify_box_bounds (bool) – whether or not to simplify redundant arrays to values for spec bounds.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

property done#

env_info_spec()[source]#: Defines the env_info provided by the environment.

get_info()[source]#: Returns the gym environment info returned on the last step.

property gym#: Return the gym environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

tensor_spec_from_gym_space(space, simplify_box_bounds=True, float_dtype=<class 'numpy.float32'>)[source]#

Construct tensor spec from gym space.

Parameters

space (gym.Space) – An instance of OpenAI gym Space.
simplify_box_bounds (bool) – if True, will try to simplify redundant arrays to make logging and debugging less verbose when printed out.
float_dtype (np.float32 | np.float64 | None) – the dtype to be used for the floating numbers. If None, it will use dtypes of gym spaces.

alf.environments.alf_wrappers#

Wrappers for ALF environments.

Adapted from TF-Agents Environment API as seen in:: https://github.com/tensorflow/agents/blob/master/tf_agents/environments/wrappers.py

class ActionObservationWrapper(env)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Add prev_action to observation.

The new observation is:

{
    'observation': original_observation,
    'prev_action': prev_action
}

Parameters: env (AlfEnvironment) – An AlfEnvironment isinstance to wrap.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

class AlfEnvironmentBaseWrapper(env)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

AlfEnvironment wrapper forwards calls to the given environment.

Create an ALF environment base wrapper.

Parameters: env (AlfEnvironment) – An AlfEnvironment instance to wrap.
Returns: A wrapped AlfEnvironment

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

env_info_spec()[source]#: Defines the env_info provided by the environment.

get_info()[source]#

Returns the environment info returned on the last step.

Returns: Info returned by last call to step(). None by default.
Raises: NotImplementedError – If the environment does not use info.

property num_tasks#: Number of tasks supported by this environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

property task_names#: The name of each tasks.

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

wrapped_env()[source]#

class AtariTerminalOnLifeLossWrapper(env)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Wrapper to change discount to 0 upon life loss for Atari.

This can potentially make it easier for the learning agent to recognize the signficance of losing a life.

Some papers report the results with this enabled (e.g. arXiv:2111.00210)

Parameters

env – ALF env to be wrapped
actions_num – number of values to discretize each action dim into

class BatchEnvironmentWrapper(envs)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Wrapper to make a list of non-batched environment into a batched environment.

Note the individual environments in envs are executed sequentially doring one step() of reset().

Parameters: envs (List[AlfEnvironment]) – a list of unbatched AlfEnvironment.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

env_info_spec()[source]#: Defines the env_info provided by the environment.

property metadata#

property num_tasks#: Number of tasks supported by this environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode)[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

property task_names#: The name of each tasks.

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

class BatchedTensorWrapper(env)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Wrapper that converts non-batched numpy-based I/O to batched tensors.

Create an ALF environment base wrapper.

Parameters: env (AlfEnvironment) – An AlfEnvironment instance to wrap.
Returns: A wrapped AlfEnvironment

class CurriculumWrapper(env, progress_favor=10.0, current_score_update_rate=0.001, past_score_update_rate=0.0005, warmup_period=100)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

A wrapper to provide automatic curriculum task selection.

The probability of a task being chosen is based on its recent progress in terms of episode reward. A task will be chosen more often if its episode reward increases faster than other tasks.

The progress of a task is defined as the difference between its current score and its past score divided by the average episode length for that task.

env (AlfEnvironment): environment to be wrapped. It needs to be batched. progress_favor (float): how much more likely to choose the environment with the

fastest progress than the ones with no progress. If progress_favor is 1, all tasks are sampled uniformly.

current_score_update_rate (float): the rate for updating the current score past_score_update_rate (float): the rate for updating the past score warmup_period (int): gradually increase progress_favor from 1 to

progress_favor during the first num_tasks * warmup_period episodes

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

env_info_spec()[source]#: Defines the env_info provided by the environment.

class DiscreteActionWrapper(env, actions_num)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Discretize each continuous action dim into several evenly distributed values. Currently only support unnested action spec with a rank-1 shape.

This wrapper can be used in both batch env mode (tensors) and individual env mode (numpy array).

Parameters

env (AlfEnvironment) – ALF env to be wrapped
actions_num (int) – number of values to discretize each action dim into

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

class GoalReplayEnvWrapper(env)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Adds a goal to the observation, used for HER (Hindsight Experience Replay).

Sources:: [1] Hindsight Experience Replay. https://arxiv.org/abs/1707.01495.

To use this wrapper, create an environment-specific version by inheriting this class.

Create a wrapper to add a goal to the observation.

Parameters: env (AlfEnvironment) – An AlfEnvironment isinstance to wrap.
Raises: ValueError – If environment observation is not a dict

abstract get_goal_from_trajectory(trajectory)[source]#

Extracts the goal from a given trajectory.

Parameters: trajectory – An instance of Trajectory.
Returns: Environment specific goal
Raises: NotImplementedError – function should be implemented in child class.

abstract get_trajectory_with_goal(trajectory, goal)[source]#

Generates a new trajectory assuming the given goal was the actual target.

One example is updating a “distance-to-goal” field in the observation. Note that relevant state information must be recovered or re-calculated from the given trajectory.

Parameters

trajectory – An instance of Trajectory.
goal – Environment specific goal

Returns

Updated instance of Trajectory

Raises

NotImplementedError – function should be implemented in child class.

class MultitaskWrapper(envs, task_names, env_id=None)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Multitask environment based on a list of environments.

All the environments need to have same observation_spec, action_spec, reward_spec and info_spec. The action_spec of the new environment becomes:

{
    'task_id': TensorSpec((), maximum=num_envs - 1, dtype='int64'),
    'action': original_action_spec
}

‘task_id’ is used to specify which task to run for the current step. Note that current implementation does not prevent switching task in the middle of one episode.

Parameters

envs (list[AlfEnvironment]) – a list of environments. Each one represents a different task.
task_names (list[str]) – the names of each task.
env_id (int) – (optional) ID of the environment.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

env_info_spec()[source]#: Defines the env_info provided by the environment.

get_num_tasks()[source]#

static load(load_fn, environment_name, env_id=None, **kwargs)[source]#

Parameters

load_fn (Callable) – function used to construct the environment for each tasks. It will be called as load_fn(env_name, **kwargs)
environment_name (list[str]) – list of environment names
env_id (int) – (optional) ID of the environment.
kwargs – arguments passed to load_fn

property num_tasks#: Number of tasks supported by this environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

property task_names#: The name of each tasks.

class NonEpisodicAgent(env, discount=1.0)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Make the agent non-episodic by replacing all termination time steps with a non-zero discount (essentially the same type as returned by the TimeLimit wrapper).

This wrapper could be useful for pure intrinsic-motivated agent, as suggested in the following paper:

EXPLORATION BY RANDOM NETWORK DISTILLATION, Burda et al. 2019,

“… We argue that this is a natural way to do exploration in simulated environments, since the agent’s intrinsic return should be related to all the novel states that it could find in the future, regardless of whether they all occur in one episode or are spread over several.

… If Alice is modelled as an episodic reinforcement learning agent, then her future return will be exactly zero if she gets a game over, which might make her overly risk averse. The real cost of a game over to Alice is the opportunity cost incurred by having to play through the game from the beginning.”

NOTE: For PURE intrinsic-motivated agents only. If you use both extrinsic and intrinsic rewards, then DO NOT use this wrapper! Because without episodic setting, the agent could exploit extrinsic rewards by intentionally die to get easy early rewards in the game.

Example usage:: suite_mario.load.env_wrappers=(@NonEpisodicAgent, ) suite_gym.load.env_wrappers=(@NonEpisodicAgent, )

Create a NonEpisodicAgent wrapper.

Parameters

env (AlfEnvironment) – An AlfEnvironment instance to wrap.
discount (float) – discount of the environment.

class NormalizedActionWrapper(env)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Normalize actions into [-1,1].

The reason why we’d like to normalize the actions, even though our action distribution networks can do this, is because we want to set target entropy independent of action ranges for algorithms like SAC.

This wrapper can be used only for individual envs (numpy array) or a batched env (tensor).

Parameters: env (AlfEnvironment) – ALF env to be wrapped

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

class PerformanceProfiler(env, process_profile_fn, process_steps)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Use cProfile to profile env execution.

Create a PerformanceProfiler that uses cProfile to profile env execution.

Parameters

env (AlfEnvironment) – An AlfEnvironment instance to wrap.
process_profile_fn (Callable) – A callback that accepts a Profile object. After process_profile_fn is called, profile information is reset.
process_steps (int) – The frequency with which process_profile_fn is called. The counter is incremented each time step is called (not reset); every process_steps steps, process_profile_fn is called and the profiler is reset.

property duration#

class RandomFirstEpisodeLength(env, random_length_range, num_episodes=1)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Randomize the length of the first episode.

The motivation is to make the observations less correlated for the environments that have fixed episode length.

Example usage:: RandomFirstEpisodeLength.random_length_range=200 suite_gym.load.alf_env_wrappers=(@RandomFirstEpisodeLength, )

Create a RandomFirstEpisodeLength wrapper.

Parameters

env (AlfEnvironment) – An AlfEnvironment isinstance to wrap.
random_length_range (int) – [1, random_length_range]
num_episodes (int) – randomize the episode length for the first so many episodes.

class ScalarRewardWrapper(env, reward_weights=None)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

A wrapper that converts a vector reward to a scalar reward by averaging reward dims with a weight vector.

Parameters

env (AlfEnvironment) – An AlfEnvironment instance to be wrapped.
reward_weights (list[float] | tuple[float]) – a list/tuple of weights for the rewards; if None, then the first dimension will be 1 and the other dimensions will be 0s.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

class TemporallyCorrelatedNoiseWrapper(env, sigma=0.5, past_noise_weight=1.0)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

Adding temporally correlated noise to actions. Reference:

Swamy et al. Causal Imitation Learning under Temporally Correlated Noise, arXiv:2202.01312

Create a Temporally Correlated Noise wrapper, which adds temporally correlated noise to the action before interacting with the environment:

noisy_action = action + past_noise_weight * past_noise + current_noise

Parameters

sigma (float) – standard deviation of the noise.
past_noise_weight (float) – the weight for the noise from the past when adding into the action for the current time step.

class TimeLimit(env, duration)[source]#

Bases: alf.environments.alf_wrappers.AlfEnvironmentBaseWrapper

End episodes after specified number of steps.

Create a TimeLimit ALF environment.

Parameters

env (AlfEnvironment) – An AlfEnvironment instance to wrap.
duration (int) – time limit, usually set to be the max_eposode_steps of the environment.

property duration#

alf.environments.carla_controller#

class PIDController(K_P, K_I, K_D, dt, integration_time_window=0.5)[source]#

Bases: object

PID controller.

See https://en.wikipedia.org/wiki/PID_controller for reference

Parameters

K_P (float) – coefficient for the proportional term
K_I (float) – coefficient for the integral term
K_D (float) – coefficient for the derivative term
dt (float) – time interval in seconds for each step
integration_time_window (float) – the window for the integral in terms of seconds. The integration is implemented as an exponentially weighted sum over the past errors where the weight is decayed by 1 - dt/integration_time_window every step.

reset()[source]#: Reset the controller.

step(current, target)[source]#

Calculate control for one step.

Parameters

current (float) – the current value
target (float) – the desired value

Returns

control

Return type

float

class VehicleController(vehicle, step_time, max_speed=5.56, max_throttle=0.75, max_steering=0.8, max_brake=0.3, s_P=3.6, s_I=0.18, s_D=0, d_P=1.95, d_I=0.07, d_D=0.2)[source]#

Bases: object

A simple vehicle controller using PID controller.

The defaults are from https://github.com/carla-simulator/carla/blob/master/PythonAPI/carla/agents/navigation/local_planner.py. Note that the max_speed and gain parameters for speed are originally specified for speed in the unit of km/h. Since here we use m/s, we have converted them as follows as our default values:

max_speed = (20 km/h) / 3.6 = 5.56 m/s s_P = (1.0 h/km) * 3.6 = 3.6 s/m s_I = (0.05 h/km) * 3.6 = 0.18 s/m s_D = (0 h/km) * 3.6 = 0 s/m

Parameters

vehicle (carla.Actor) – the actor for vehicle
step_time (float) – time interval in seconds for each step
max_speed (float) – maximal speed in m/s. Default to 5.6 m/s which is about 20 km/h.
max_throttle (float) – maximal throttle
max_steering (float) – maximal steering
max_brake (float) – maximal brake
s_P (float) – coefficient of the proportional term for the speed controller, with the unit as s/m
s_I (float) – coefficient of the integral term for the speed controller, with the unit as s/m
s_D (float) – coefficient of the derivative term for the speed controller, with the unit as s/m
d_P (float) – coefficient of the proportional term for the direction controller
d_I (float) – coefficient of the integral term for the direction controller
d_D (float) – coefficient of the derivative term for the direction controller

act(action)[source]#

Generate carla.VehicleControl based on action

Parameters: action (np.ndarray) – 3-D vector representing action
Returns: carla.VehicleControl

action_desc()[source]#

Get the description about the action.

Returns: the description about the action
Return type: str

action_spec()[source]#

Get the action spec.

The action is a 3-D vector of [speed, direction, reverse], where speed is in [-1.0, 1.0] with negative value meaning zero speed and 1.0 corresponding to maximally allowed speed as provided by the max_speed argument for __init__(), and direction is the relative direction that the vehicle is facing, with 0 being front, -0.5 being left and 0.5 being right, and reverse is interpreted as a boolean value with values greater than 0.5 corrsponding to True to indicate going backward.

Returns: alf.BoundedTensorSpec

reset()[source]#: Reset the controller.

alf.environments.carla_sensors#

class BEVSensor(parent_actor, alf_world, navigation_sensor, image_height_in_pixels=200, image_width_in_pixels=200, pixels_per_meter=5, observation_mode='rgb', pixels_ev_to_bottom=50, history_idx=[- 16, - 11, - 6, - 1], max_history_len=20, vehicle_bbox_factor=1.0, walker_bbox_factor=2.0)[source]#

Bases: alf.environments.carla_sensors.SensorBase

BEVSensor. Adapted from https://github.com/zhejz/carla-roach/blob/main/carla_gym/core/obs_manager/birdview/chauffeurnet.py

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) – the world object keeping all relevant data and some utility functions (e.g., _get_traffic_light_waypoints)
navigation_sensor (str) – the navigation sensor associated with the parent_actor
image_height_in_pixels (int) – number of pixels for the height of rendered BEV image.
image_width_in_pixels (int) – number of pixels for the width of rendered BEV image.
pixels_per_meter (int) – how many pixels in the BEV image correspond to one meter in the world coordinate
observation_mode (str) –
a string indicating the observation mode for the BEV image. - If “rgb”, the sensor will return encoded rgb image as sensor

readings.
- If “mask”: it will use a multi-channel mask image as the
  sensor readings.
- If ‘bitmap’: it will use a multi-channel mask representation,
  and encode the mask tensor with bit representation. In this case, a proper decoder might be needed for the bitmap before being used for training.
pixels_ev_to_bottom (int) – the number of pixels of the ego vehicle (ev) to the bottom of the BEV image.
history_idx (list[int]) – a list of numbers representing the indices of the history information to be rendered for non-ego vehicles. For example, we can set history_idx=[-1] for keep only the most recent observation or history_idx=[-11, -1] for both the lastest and also the one 10 steps earlier.
max_history_len (int) – max number of history length preserved
vehicle_bbox_factor (float) – a factor to scale the vehicle bounding boxes
walker_bbox_factor (float) – a factor to scale the walker bounding boxes

generate_observation_masks()[source]#

Generate all the masks required for rendering the BEV observation.

Returns

Return type

a dictionary containing masks for different elements in the scene

get_current_observation(current_frame)[source]#

Get the current observation.

The observation is an [C, H, W] array with C=3 if self._use_rgb_image is True. Otherwise, it is the a multi-channel mask image including road_mask, route_mask, lane_mask for the first 3 channels, and 3 * len(self._history_idx) channles for vehicle_mask, walker_mask and traffic light mask.

Parameters: current_frame (int) – not used.
Returns: BEV image

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

render()[source]#

Return the rendered RGB image of the BEV view

Parameters: display (pygame.Surface) – the display surface to draw the image

class CameraSensor(parent_actor, sensor_type='sensor.camera.rgb', xyz=(1.6, 0.0, 1.7), pyr=(0.0, 0.0, 0.0), attachment_type='rigid', fov=90.0, fstop=1.4, gamma=2.2, image_size_x=640, image_size_y=480, iso=1200.0)[source]#

Bases: alf.environments.carla_sensors.SensorBase

CameraSensor.

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor
sensor_type (str) – ‘sensor.camera.rgb’, ‘sensor.camera.depth’, ‘sensor.camera.semantic_segmentation’
attachment_type (str) – There are two types of attachement. ‘rigid’: the object follow its parent position strictly. ‘spring_arm’: the object expands or retracts depending on camera situation.
xyz (tuple[float]) – the attachment position (x, y, z) relative to the parent_actor.
pyr (tuple[float]) – the attachment rotation (pitch, yaw, roll) in degrees.
fov (str) – horizontal field of view in degrees.
image_size_x (int) – image width in pixels.
image_size_y (int) – image height in pixels.
gamma (float) – target gamma value of the camera.
iso (float) – the camera sensor sensitivity.

get_current_observation(current_frame)[source]#

Parameters

current_frame (int) – not used.

Returns

The shape is [num_channels, image_size_y, image_size_x],: where num_channels is 3 for rgb sensor, and 1 for other sensors.

Return type

np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

render(display)[source]#

Render the camera image to a pygame display.

Parameters: display (pygame.Surface) – the display surface to draw the image

class CollisionSensor(parent_actor, max_num_collisions=4, include_collision_location=False)[source]#

Bases: alf.environments.carla_sensors.SensorBase

CollisionSensor for getting collision signal.

It gets the impulses and optionally the locations for the collisions during the last tick.

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor
max_num_collisions (int) – maximal number of collisions to be included
include_collision_location (bool) – whether to include collision
into the observation. If True, will include the position (location) –
y, z) of the other actor relative to the ego actor ((x,) –
(parent_actor) –

get_current_observation(current_frame)[source]#

Get the current observation.

Parameters

current_frame (int) – current frame no. CollisionSensor may not not receive any data in the most recent tick. current_frame will be compared against the frame no. of the last received data to make sure that the data is correctly interpretted.

Returns

Impulses from collision during the last tick. Each: impulse is a 3-D vector. At most max_num_collisions collisions are used. The result is padded with zeros if there are less than max_num_collisions collisions. If include_other_actor is True, the observation will have the shape of [max_num_collisions, 2, 3], by stacking the impulses and corresponding collision locations (in ego-coordinate) along dim-1.

Return type

np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class DynamicObjectSensor(parent_actor, alf_world, history_idx=[- 16, - 11, - 6, - 1], object_filter='vehicle.*', max_object_number=3, with_ego_history=True, view_radius=100)[source]#

Bases: alf.environments.carla_sensors.SensorBase

DynamicObjectSensor. A sensor that perceives the dynamic objects around the ego agent.

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) – the world object keeping all relevant data and some utility functions.
navigation_sensor (str) – the navigation sensor associated with the parent_actor
history_idx (list[int]) – a list of numbers representing the indices of the history information to be rendered for all dynamic objects. For example, we can set history_idx=[-1] for keep only the most recent observation or history_idx=[-11, -1] for both the lastest and also the one 10 steps earlier.
object_filter (str) – a string representing the type of dynamic objects to be perceived, following the blueprint filter format. By default, surrounding dynamic vehicles are perceived.
max_object_number (int) – the maximum number of dynamic objects that can be perceived within one time step, including ego vehicle if with_ego_history is True; otherwise, the maximum number of non-ego dynamic objects that can be perfriced in one time step. When the number of dynamic objects is larger than max_object_number, those that are far from the ego agent will be excluded from the observation until the condition on max_object_number is satisfied.
with_ego_history (bool) – whether to include ego history.
view_radius (float) – the radius of the view/perceivable field of the sensor (meter).

clean()[source]#

destroy()[source]#

Return the commands for destroying this sensor.

Use carla.Client.apply_batch_sync() to actually destroy the sensor.

Returns: the commands used to destroy the sensor.
Return type: list[carla.command]

get_current_observation(current_frame)[source]#

Get the current observation. :param current_frame: not used. :type current_frame: int

Returns: the current obsevation tensor.

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

render(x_range=[- 50, 50], y_range=[- 50, 50], img_height=256, img_width=256, dpi=300, figsize=(2, 2), linewidth=4, marker_size=5)[source]#

Return the rendered RGB image of the BEV view of the dynamic objects

Parameters

x_range (list[float]) – x range for rendering (meter)
x_range – y range for rendering (meter)
img_height (int) – height of the rendered image (pixel)
img_width (int) – width of the rendered image (pixel)
dpi (int) – dpi of the rendered image
figsize (tuple[int]) – figure size used in matplotlib (inches)
linewidth (int) – width of the line representing the trajectories
marker_size (int) – the size if the marker, representing the latest position in the trajectory.

reset()[source]#

world_polyline_to_ego_array(polyline, ev_transform)[source]#

class GnssSensor(parent_actor)[source]#

Bases: alf.environments.carla_sensors.SensorBase

GnssSensor for sensing GPS location.

Parameters: parent_actor (carla.Actor) – the parent actor of this sensor

get_current_observation(current_frame)[source]#

Parameters

current_frame (int) – not used

Returns

A vector of [latitude (degrees), longitude (degrees),: altitude (meters to be confirmed)]

Return type

np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class IMUSensor(parent_actor)[source]#

Bases: alf.environments.carla_sensors.SensorBase

IMUSensor for sensing acceleration and rotation.

Parameters: parent_actor (carla.Actor) – the parent actor of this sensor

get_current_observation(current_frame)[source]#

Get the current observation.

Parameters: current_frame (int) – current frame no. For some sensors, they may not receive any data in the most recent tick. current_frame will be compared against the frame no. of the last received data to make sure that the data is correctly interpretted. Note that if the sensor receives event in the most recent frame, event.frame should be equal to current_frame - 1.
Returns: sensor data received in the last tick.
Return type: nested np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class LaneInvasionSensor(parent_actor)[source]#

Bases: alf.environments.carla_sensors.SensorBase

LaneInvasionSensor for detecting lane invasion.

Lane invasion cannot be directly observed by raw sensors used by real cars. So main purpose of this is to provide training signal (e.g. reward).

TODO: not completed.

Parameters: parent_actor (carla.Actor) – the parent actor of this sensor

get_current_observation(current_frame)[source]#

Get the current observation.

Parameters: current_frame (int) – current frame no. For some sensors, they may not receive any data in the most recent tick. current_frame will be compared against the frame no. of the last received data to make sure that the data is correctly interpretted. Note that if the sensor receives event in the most recent frame, event.frame should be equal to current_frame - 1.
Returns: sensor data received in the last tick.
Return type: nested np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class NavigationSensor(parent_actor, alf_world)[source]#

Bases: alf.environments.carla_sensors.SensorBase

Generating future waypoints on the route.

Note that the route is fixed (not change based on current vehicle location).

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) –

WINDOW = 5#

get_current_observation(current_frame)[source]#

Get the current observation.

The observation is an 8x3 array consists of the positions of 8 future locations on the routes.

Parameters: current_frame (int) – not used.
Returns: 8 3-D positions of future waypoints on the route. Note that the positions are absolution coordinates. However, the Player will transform them to egocentric coordinates as the observation for Player
Return type: np.ndarray

get_current_route(future_number)[source]#

Get the current navigation route based on the location.

Parameters

future_number (int) – the number of future route waypoints. If -1, all the future waypoints on the route will be returned.

Returns

contains the 3-D positions of future waypoints on the: route. Note that the positions are absolution coordinates.

Return type

np.ndarray

get_next_waypoint_index()[source]#

Get the index next waypoint.

The next waypoint is the waypoint after the nearest waypoint to the car.

Returns: index of the next waypoint
Return type: int

get_waypoint(i)[source]#

Get the coordinate of waypoint i.

Parameters: i (int) – waypoint index
Returns: 3-D vector of location
Return type: numpy.ndarray

property num_waypoints#: The number of waypoints in the route.

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

set_destination(destination)[source]#

Set the navigation destination.

Parameters: destination (carla.Location) –
Returns: The total length of the route in meters, starting from the current vehicle location to the destination.

class NumpyLaneMarking(color, lane_change, type, width)#

Bases: tuple

Create new instance of NumpyLaneMarking(color, lane_change, type, width)

color#: Alias for field number 0

lane_change#: Alias for field number 1

type#: Alias for field number 2

width#: Alias for field number 3

class NumpyWaypoint(id, location, rotation, road_id, section_id, lane_id, is_junction, lane_width, lane_change, lane_type, right_lane_marking, left_lane_marking)#

Bases: tuple

Create new instance of NumpyWaypoint(id, location, rotation, road_id, section_id, lane_id, is_junction, lane_width, lane_change, lane_type, right_lane_marking, left_lane_marking)

id#: Alias for field number 0

is_junction#: Alias for field number 6

lane_change#: Alias for field number 8

lane_id#: Alias for field number 5

lane_type#: Alias for field number 9

lane_width#: Alias for field number 7

left_lane_marking#: Alias for field number 11

location#: Alias for field number 1

right_lane_marking#: Alias for field number 10

road_id#: Alias for field number 3

rotation#: Alias for field number 2

section_id#: Alias for field number 4

class ObstacleDetectionSensor(parent_actor, xyz=(2.0, 0.0, 1.7), pyr=(0.0, 0.0, 0.0), distance=250, hit_radius=1, only_dynamics=False, debug_message=False)[source]#

Bases: alf.environments.carla_sensors.SensorBase

ObstacleDetectionSensor. A sensor that detects the frontal obstacle and use the distance as the observation. It registers an event every time the parent actor has an obstacle ahead. In order to anticipate obstacles, the sensor creates a capsular shape ahead of the parent vehicle and uses it to check for collisions (https://carla.readthedocs.io/en/latest/ref_sensors/#obstacle-detector). This detection technique is also known as sphere tracing

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor.
xyz (tuple[float]) – the attachment position (x, y, z) relative to the parent_actor. This value should be set properly to put the sensor on the windshield of the actor to avoid detection of collision with the actor itself. A default value of (2.0, 0., 1.7) is provided for typical sedan vehicles. For another type of vehicle that is much larger, a larger x value should be used.
pyr (tuple[float]) – the attachment rotation (pitch, yaw, roll) in degrees.
distance (float) – distance within which to be considerred for obstacle detection.
hit_radius (float) – radius of the trace in sphere tracing.
only_dynamics (bool) – If True, the trace will only take for dynamic objects into consideration; otherwise, will also consider static objects.
debug_message (bool) – If True, will log the debug message.

get_current_observation(current_frame)[source]#

Get the current observation.

Parameters: current_frame (int) – current frame number.
Returns: 1D vector contains the distance to the frontal obstacle.
Return type: np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class RadarSensor(parent_actor, xyz=(2.8, 0.0, 1.0), pyr=(5.0, 0.0, 0.0), max_num_detections=200)[source]#

Bases: alf.environments.carla_sensors.SensorBase

RadarSensor for detecting obstacles.

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor.
xyz (tuple[float]) – the attachment position (x, y, z) relative to the parent_actor.
pyr (tuple[float]) – the attachment rotation (pitch, yaw, roll) in degrees.
max_num_detections (int) – maximal number of detection points.

get_current_observation(current_frame)[source]#

Parameters

current_frame (int) – current frame no. RadarSensor may not receive any data in the most recent tick. current_frame will be compared against the frame no. of the last received data to make sure that the data is correctly interpretted.

Returns

A set of detected points. Each detected point is a 4-D: vector of [vel, altitude, azimuth, depth], where vel is the velocity of the detected object towards the sensor in m/s, altitude is the altitude angle of the detection in radians, azimuth is the azimuth angle of the detection in radians, and depth is the distance from the sensor to the detection in meters.

Return type

np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class RedlightSensor(parent_actor, player)[source]#

Bases: alf.environments.carla_sensors.SensorBase

Provide a scalar value representing the distance to the redlight that affects the current Player.

Parameters

parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) –

get_current_observation(red_light_dist)[source]#

Get the current observation.

The a scalar value representing the distance to the redlight.

Parameters: current_frame (int) – not used.
Returns: 1-D array representing the distance to the redlight that affects the current Player.
Return type: np.ndarray

observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class SensorBase(parent_actor)[source]#

Bases: abc.ABC

Base class for sersors.

Parameters: parent_actor (carla.Actor) – the parent actor of this sensor

destroy()[source]#

Return the commands for destroying this sensor.

Use carla.Client.apply_batch_sync() to actually destroy the sensor.

Returns: the commands used to destroy the sensor.
Return type: list[carla.command]

abstract get_current_observation(current_frame)[source]#

Get the current observation.

Parameters: current_frame (int) – current frame no. For some sensors, they may not receive any data in the most recent tick. current_frame will be compared against the frame no. of the last received data to make sure that the data is correctly interpretted. Note that if the sensor receives event in the most recent frame, event.frame should be equal to current_frame - 1.
Returns: sensor data received in the last tick.
Return type: nested np.ndarray

abstract observation_desc()[source]#

Get the description about the observation of this sensor.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

abstract observation_spec()[source]#

Get the observation spec of this sensor.

Returns
Return type: nested TensorSpec

class World(world, route_resolution=1.0)[source]#

Bases: object

Keeping data for the world.

Parameters

world (carla.World) – the carla world instance
route_resolution (float) – the resolution in meters for planned route

DEFAULT_ENCOUNTERED_RED_LIGHT_DISTANCE = 10000000000.0#

RED_LIGHT_ENFORCE_DISTANCE = 15#

add_actor(actor)[source]#

get_active_speed_limit(actor, dis_threshold=1.0)[source]#

Get active speed limit for the actor.

Parameters

actor (carla.Actor) – the vehicle actor
dis_threshold (float) – the distance within which to consider the speed limit sign as active. The one closest to the actor in the active set will be used as the current speed limit. If a negative value is provided, all speed limit signs are taken into considerations for determining the closest one.

Returns

the value of the speed limit in m/s is there is a speed limit sign
within the distance of dis_threshold
None if there is no active speed limit sign

get_actor_location(aid)[source]#

Get the latest location of the actor.

The reason of using this instead of calling carla.Actor.get_location() directly is that the location of actors may not have been updated before world.tick().

Parameters: aid (int) – actor id
Returns
Return type: carla.Location

get_actors()[source]#

get_waypoints()[source]#

Get the coordinates of waypoints

Returns
Return type: list[carla.Waypoint]

is_running_red_light(actor)[source]#

Whether actor is running red light.

Adapted from RunningRedLightTest.update() in https://github.com/carla-simulator/scenario_runner/blob/master/srunner/scenariomanager/scenarioatomics/atomic_criteria.py

Parameters

actor (carla.Actor) – the vehicle actor

Returns

violated red light id if running red light, None otherwise
encountered red light id if encounting one, None otherwise
distance to the encountered red light id if encountering one,

DEFAULT_ENCOUNTERED_RED_LIGHT_DISTANCE otherwise

on_tick()[source]#: Should be called after every world tick() to update data.

property route_resolution#: The sampling resolution of route.

trace_route(origin, destination)[source]#

Find the route from origin to destination.

Parameters

origin (carla.Location) –
destination (carla.Location) –

Returns

list[tuple(carla.Waypoint, RoadOption)]

transform_to_geolocation(location)[source]#

Transform a map coordiate to geo coordinate.

Returns: [latitude, longitude, altidude]
Return type: np.ndarray

update_actor_location(aid, loc)[source]#

Update the next location of the actor.

Parameters

aid (int) – actor id
loc (carla.Location) – location of the actor

get_scaled_image_size(height, width)[source]#

Compute properly scaled image size.

The scaled image height and width are calculated based on the minimum and maximum allowed sizes for rendering, while keeping the aspect ratio of the image unchanged. If both the height and width are within the bound, no scaling is applied.

Returns

scaled_height (int): scaled image height
scaled_width (int): scaled image width

Return type

tuple

interp_color(color, factor)[source]#

Parameters

color (tuple) – tuple of rgb values representing a color.
factor (float) – a value for linearly interpolating between the input color (when factor=0) and the white color (when factor=1).

alf.environments.carla_spectator#

A utility to watch the vehicles in a simulation.

A typical scenario is that you have an on-going Carla training session and you want to see what the training vehicles are doing. You can use this utility to do this:

python carla_spectator --port 2000 --host localhost

If you only have one training session going on, the port is 2000 by default. You can use ps aux | grep Carla to find out –carla-rpc-port and use it to replace 2000.

After carla_spectator starts, you can use TAB key to switch to different vehicles and ESC key to quit the program.

main(_)[source]#

alf.environments.dmc_gym_wrapper#

Wrap dm_control environment with a Gym interface. Adapted and simplified from https://github.com/denisyarats/dmc2gym

class DMCGYMWrapper(domain_name, task_name, visualize_reward=True, from_pixels=False, height=84, width=84, camera_id=0, control_timestep=None)[source]#

Bases: gym.core.Env

A Gym env that wraps a dm_control environment.

Parameters

domain_name (str) – the domain name corresponds to the physical robot
task_name (str) – a specific task under a domain, which corresponds to a particular MDP structure
visualize_reward (bool) – if True, then the rendered frame will have a highlighted color when the agent achieves a reward.
from_pixels (bool) – if True, the observation will be raw pixels; otherwise use the interval state vector as the observation.
height (int) – image observation height
width (int) – image observation width
camera_id (int) – which camera to render; a MuJoCo xml file can define multiple cameras with different views
control_timestep (Optional[float]) – the time duration between two agent actions. If this is greater than the agent’s primitive physics timestep, then multiple physics simulation steps might be performed between two actions. If None, the default control timstep defined by DM control suite will be used.

property action_space#

property observation_space#

render(mode='rgb_array', height=None, width=None, camera_id=0)[source]#: Render an RGB image. Copied from https://github.com/denisyarats/dmc2gym

reset()[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

seed(seed)[source]#

Sets the seed for this env’s random number generator(s).

Note

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns

Returns the list of seeds used in this env’s random: number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

Return type

list<bigint>

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

alf.environments.fast_parallel_environment#

class FastParallelEnvironment(env_constructors, start_serially=True, blocking=False, flatten=True, num_spare_envs_for_reload=0)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Batch together environments and simulate them in external processes.

The environments are created in external processes by calling the provided callables. This can be an environment class, or a function creating the environment and potentially wrapping it. The environments can be different but must use the same action and observation specs.

Different from parallel_environment.ParallelAlfEnvironment, FastParallelEnvironment uses shared memory to transfer TimeStep from each process environment to the main process.

Terminology:

main process: the process where ParallelEnvironment is created
client process: the process running the actual individual environment created
using env_constructors

Design:

FastParallelEnvironment uses _penv.ParallelEnvironment (implemented in C++) to coordinate step() and reset(). Each ProcessEnvironment maintains one _penv.ProcessEnvironmentCaller in the main process and one _penv.ProcessEnvironment in the client process.

In the client process, _penv.ProcessEnvironment.worker() runs in a loop to wait for jobs from either _penv.ParallelEnvironment or _penv.ProcessEnvironmentCaller.

There are 4 types of job:

step: step the environment. Sent from _penv.ParallelEnvironment. The
result is communicated back using shared memory.
reset: reset the environment. Sent from _penv.ParallelEnvironment.
The result is communicated back using shared memory.
close: close the environment. Sent from _penv.ProcessEnvironmentCaller.
This will cause the worker to finish and quit the process.
call: access other methods of the environment. Sent from _penv.ProcessEnvironmentCaller.
This takes advantage of the pipe mechanism used by the ParallelAlfEnvironment. This is achieved by calling call_handler to do communication using python pipe. The reason of using the original pipe mechanism for other types of communication is that it is not easy to handle communication of unknow size using shared memory.

Parameters

env_constructors (list[Callable]) – a list of callable environment creators.
start_serially (bool) – whether to start environments serially or in parallel.
blocking (bool) – not used. Kept for the same interface as ParallelAlfEnvironment.
flatten (bool) – not used. Kept for the same interface as ParallelAlfEnvironment.
num_spare_envs_for_reload (int) – if positive, these environments will be maintained in a separate queue and be used to handle slow env resets. The batch_size is len(env_constructors) - num_spare_envs_for_reload

Raises

ValueError – If the action or observation specs don’t match.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#: Close all external process.

env_info_spec()[source]#: Defines the env_info provided by the environment.

property envs#: The list of individual environment.

property metadata#

property num_spare_envs_for_reload#

property num_tasks#: Number of tasks supported by this environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seeds)[source]#: Seeds the parallel environments.

start()[source]#

property task_names#: The name of each tasks.

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

alf.environments.gym_wrappers#

Wrappers for gym (numpy) environments.

class BaseObservationWrapper(env, fields=None)[source]#

Bases: gym.core.ObservationWrapper

Base observation Wrapper

BaseObservationWrapper provide basic functions and generic interface for transformation.

The key interface functions are: 1. transform_space(): transform space. 2. transform_observation(): transform observation.

Parameters

env (gym.Env) – the gym environment
fields (list[str]) – fields to be applied transformation, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is transformed

observation(observation)[source]#

transform_observation(observation)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class ContinuousActionClip(env, min_v=- 1000000000.0, max_v=1000000000.0)[source]#

Bases: gym.core.ActionWrapper

Clip continuous actions according to the action space.

Note that any action outside of the bounds specified by action_space will be clipped to the bounds before passing to the underlying environment.

Create an ContinuousActionClip gym wrapper.

Parameters: env (gym.Env) – A Gym env instance to wrap

action(action)[source]#

class ContinuousActionMapping(env, low, high)[source]#

Bases: gym.core.ActionWrapper

Map continuous actions to a desired action space, while keeping discrete actions unchanged.

Parameters

env (gym.Env) – Gym env to be wrapped
low (float) – the action lower bound to map to.
high (float) – the action higher bound to map to.

action(action)[source]#

class DMAtariPreprocessing(env, frame_skip=4, noop_max=30, screen_size=84, gray_scale=True)[source]#

Bases: gym.core.Wrapper

Derived from tf_agents AtariPreprocessing. Three differences: 1. Random number of NOOPs after reset 2. FIRE after a reset or a lost life. This is for the purpose of evaluation

with greedy prediction without getting stuck in the early training stage.

A lost life doesn’t result in a terminal state

NOTE: Some implementations forces the time step that loses a life to have a zero value (i.e., mark a ‘terminal’ state) to help boostrap value functions, but only resetting the env when all lives are used (`done==True`). In this case, the episodic score is still summed over all lives.

For our implementation, we only mark a terminal state when all lives are used (done==True). It’s more difficult to learn in our case (time horizon is longer).

To see a complete list of atari wrappers used by DeepMind, see https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/atari_wrappers.py Also see OpenAI Gym’s implementation (not completely the same): https://github.com/openai/gym/blob/master/gym/wrappers/atari_preprocessing.py

(This wrapper does not handle framestacking. It can be paired with FrameStack. See atari.gin for an example.)

Constructor for an Atari 2600 preprocessor.

Parameters

env (gym.Env) – the environment whose observations are preprocessed.
frame_skip (int) – the frequency at which the agent experiences the game.
noop_max (int) – the maximum number of no-op actions after resetting the env
screen_size (int) – size of a resized Atari 2600 frame.
gray_scale (bool) –

fire()[source]#

reset()[source]#

Resets the environment. :returns:

the initial observation emitted by the
environment.

Return type: observation (np.array)

step(action)[source]#

Applies the given action in the environment.

Remarks:

If a terminal state (episode end) is reached, this may
execute fewer than self.frame_skip steps in the environment.
Furthermore, in this case the returned observation may not contain valid
image data and should be ignored.

Parameters

action (int) – The action to be executed.

Returns

the observation following the action. reward (float): the reward following the action. game_over (bool): whether the environment has reached a terminal state.

This is true when an episode is over.

info: Gym API’s info data structure.

Return type

observation (np.array)

class EpisodicRandomFrameCrop(env, cropping_fraction=0.8, channel_order='channels_last', share_cropping=True, fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Create a frame cropping wrapper that augments the data distribution by randomly crops the image frame according to the specified fraction. Each episode has a randomized cropping location which is consistent over the episode.

Parameters

env (Env) – the gym environment
cropping_fraction – the portion of the original image to crop (keep)
channel_order (str) – The ordering of the dimensions in the input images from the env, should either “channels_last” or “channels_first”.
share_cropping (bool) – if there are multiple image fields, whether they share the same cropping position at each time step. This might be useful if there are multiple images with the same camera intrinsics, e.g., RGB + depth.
fields (Optional[List[str]]) – fields to be cropped. A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is cropped.

observation(observation)[source]#

reset(**kwargs)[source]#: Randomly select cropping start positions.

transform_observation(observation, sy, sx, space)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class FrameCrop(env, sx=0, sy=0, width=84, height=84, channel_order='channels_last', fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Create a FrameCrop instance

Parameters

env (gym.Env) – the gym environment
sx (int) – start position along the horizonal direction (x-axis)
sy (int) – start position along the vertical direction (y-axis)
width (int) – crop width
height – crop height

transform_observation(observation)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class FrameFlip(env, ud_flip_prob=0.5, lr_flip_prob=0.5, channel_order='channels_last', fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Create a frame flipping wrapper that randomly flips the image fields either vertically or horizontally. For each episode, all fields will have the SAME flipping operation.

The prob for each flipping result:

identical: (1 - udp) * (1 - lrp)
ud_flip: udp * (1 - lrp)
lr_flip: (1 - udp) * lrp
rotate180: udp * lrp

This wrapper is usually used for data augmentation.

Parameters

env (Env) – the gym environment
ud_flip_prob (float) – the prob of flipping up-down on the original image.
lr_flip_prob (float) – the prob of flipping left-right, after the testing of up-down flipping.
channel_order (str) – The ordering of the dimensions in the input images from the env, should either “channels_last” or “channels_first”.
fields (Optional[List[str]]) – fields to be cropped. A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is cropped.

reset(**kargs)[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

transform_observation(observation)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class FrameGrayScale(env, fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Gray scale image observation

Create a FrameGrayScale instance

Parameters

env (gym.Env) – the gym environment
fields (list[str]) – fields to be gray scaled, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is gray scaled

transform_observation(obs)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class FrameResize(env, width=84, height=84, fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Create a FrameResize instance

Parameters

env (gym.Env) – the gym environment
width (int) – resize width
height (int) – resize height
fields (list[str]) – fields to be resized, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is resized

transform_observation(observation)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class FrameSkip(env, skip)[source]#

Bases: gym.core.Wrapper

Repeat same action n times and return the last observation: and accumulated reward

Create a FrameSkip object

Parameters

env (gym.Env) – the gym environment
skip (int) – skip skip frames (skip=1 means no skip)

reset(**kwargs)[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

class FrameStack(env, stack_size=4, channel_order='channels_last', fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Stack previous stack_size frames, applied to Gym env.

This is deprecated. Please use alf.algorithms.data_transformer.FrameStacker, which is more memory-efficient.

Create a FrameStack object.

Parameters

env (gym.Space) – gym environment.
stack_size (int) – stack so many frames
channel_order (str) – The ordering of the dimensions in the input images from the env, should be one of channels_last or channels_first.
fields (list[str]) – fields to be stacked, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is stacked.

observation(observation)[source]#

reset()[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

transform_observation(observation, field)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class ImageChannelFirst(env, fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Make images in observations channel_first.

Args: env (gym.Env): the gym environment fields (list[str]): fields to be applied transformation, A field str is a multi-level

path denoted by “A.B.C”. If None, then non-nested observation is transformed

transform_observation(observation)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class NonEpisodicEnv(env)[source]#

Bases: gym.core.Wrapper

Make a gym environment non-episodic by always setting done=False.

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

class NormalizedAction(env)[source]#

Bases: alf.environments.gym_wrappers.ContinuousActionMapping

Normalize actions to [-1, 1]. This normalized action space is friendly to algorithms that computes action entropy, e.g., SAC.

Args: env (gym.Env): Gym env to be wrapped low (float): the action lower bound to map to. high (float): the action higher bound to map to.

transform_space(observation_space, field, func)[source]#

Transform the child space in observation_space indicated by field using func

Parameters

observation_space (gym.Space) – space to be transformed
field (str) – field of the space to be transformed, multi-level path denoted by “A.B.C” If None, then non-nested observation_space is transformed
func (Callable) – transform function. The function will be called as func(observation_space, level) and should return new observation_space.

Returns

transformed space

alf.environments.make_penv#

gen_penv()[source]#

alf.environments.mario_wrappers#

class FrameFormat(env, data_format='channels_last')[source]#

Bases: gym.core.Wrapper

Format frame to specified data_format

Parameters: data_format – Data format for frame channels_first for CHW and channels_last for HWC

reset()[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

class LimitedDiscreteActions(env, all_buttons)[source]#

Bases: gym.core.ActionWrapper

Wrap mario environment and make it use discrete actions. Map available button combinations to discrete actions eg:

0 -> None 1 -> UP 2 -> DOWN … k -> A … m -> A + LEFT … n -> B + UP …

BUTTONS = {'A', 'B'}#

SHOULDERS = {'L', 'R'}#

action(a)[source]#

class MarioXReward(env)[source]#

Bases: gym.core.Wrapper

Wrap mario environment and use X-axis coordinate increment as reward.

if initial or upgrade_to_new_level
    reward, max_x = 0, 0
else:
    current_x = xscrollHi * 256 + xscrollLo
    reward = current_x - max_x if current_x > max_x else 0
    max_x = current_x if current_x > max_x else max_x

reset()[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

class ProcessFrame84(env, crop=True)[source]#

Bases: gym.core.ObservationWrapper

Resize frame from original resolution to 84x84 or resize to 84x110 and then crop to 84x84

observation(obs)[source]#

static process(frame, crop=True)[source]#

alf.environments.parallel_environment#

Runs multiple environments in parallel processes and steps them in batch.

Adapted from TF-Agents Environment API as seen in:: https://github.com/tensorflow/agents/blob/master/tf_agents/environments/parallel_py_environment.py

class ParallelAlfEnvironment(env_constructors, start_serially=True, blocking=False, flatten=True, num_spare_envs_for_reload=0)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Batch together environments and simulate them in external processes.

The returned environment should not access global variables.

Parameters

env_constructors (list[Callable]) – a list of callable environment creators.
start_serially (bool) – whether to start environments serially or in parallel.
blocking (bool) – whether to step environments one after another.
flatten (bool) – whether to use flatten action and time_steps during communication to reduce overhead.
num_spare_envs_for_reload (int) – if positive, these environments will be maintained in a separate queue and be used to handle slow env resets. The batch_size is len(env_constructors) - num_spare_envs_for_reload

Raises

ValueError – If the action or observation specs don’t match.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#: Close all external process.

env_info_spec()[source]#: Defines the env_info provided by the environment.

property envs#: The list of individual environment.

property metadata#

property num_spare_envs_for_reload#

property num_tasks#: Number of tasks supported by this environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seeds)[source]#: Seeds the parallel environments.

start()[source]#

property task_names#: The name of each tasks.

time_step_spec()[source]#

Describes the TimeStep fields returned by step().

Override this method to define an environment that uses non-standard values for any of the items returned by step(). For example, an environment with tensor-valued rewards.

Returns: A TimeStep namedtuple containing (possibly nested) TensorSpec defining the step_type, reward, discount, observation, prev_action, and end_id.

alf.environments.process_environment#

Step a single env in a separate process for lock free paralellism.

Adapted from TF-Agents Environment API as seen in:: https://github.com/tensorflow/agents/blob/master/tf_agents/environments/parallel_py_environment.py

class ProcessEnvironment(env_constructor, env_id=None, flatten=False, fast=False, num_envs=0, name='')[source]#

Bases: object

Step environment in a separate process for lock free paralellism.

The environment is created in an external process by calling the provided callable. This can be an environment class, or a function creating the environment and potentially wrapping it. The returned environment should not access global variables.

Parameters

env_constructor (Callable) – callable environment creator.
env_id (torch.int32) – ID of the the env
flatten (bool) – whether to assume flattened actions and time_steps during communication to avoid overhead.
fast (bool) – whether created by FastParallelEnvironment or not.
num_envs (int) – number of environments in the FastParallelEnvironment. Only used if fast is True.
name (str) – name of the FastParallelEnvironment. Only used if fast is True.

observation_spec[source]#: The cached observation spec of the environment.

action_spec[source]#: The cached action spec of the environment.

time_step_spec[source]#: The cached time step spec of the environment.

action_spec()[source]#

call(name, *args, **kwargs)[source]#

Asynchronously call a method of the external environment.

Parameters

name (str) – Name of the method to call.
*args – Positional arguments to forward to the method.
**kwargs – Keyword arguments to forward to the method.

Returns

Promise object that blocks and provides the return value when called.

close()[source]#: Send a close message to the external process and join it.

env_info_spec()[source]#

observation_spec()[source]#

render(mode='human')[source]#

Render the environment.

Parameters: mode (str) – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reset(blocking=True)[source]#

Reset the environment.

Parameters: blocking (bool) – Whether to wait for the result.
Returns: New observation when blocking, otherwise callable that returns the new observation.

reward_spec()[source]#

start(wait_to_start=True)[source]#

Start the process.

Parameters: wait_to_start (bool) – Whether the call should wait for an env initialization.

step(action, blocking=True)[source]#

Step the environment.

Parameters

action (nested tensors) – The action to apply to the environment.
blocking (bool) – Whether to wait for the result.

Returns

time step when blocking, otherwise callable that returns the time step.

time_step_spec()[source]#

wait_start()[source]#: Wait for the started process to finish initialization.

process_call(conn, env, flatten, action_spec)[source]#

Returns: continue to work False: end the worker
Return type: True

alf.environments.random_alf_environment#

An environment that generates random observations.

Adapted from TF-Agents Environment API as seen in:: https://github.com/tensorflow/agents/blob/master/tf_agents/environments/random_py_environment.py

class RandomAlfEnvironment(observation_spec, action_spec, env_id=None, episode_end_probability=0.1, discount=1.0, reward_fn=None, batch_size=None, seed=42, render_size=(2, 2, 3), min_duration=0, max_duration=None, use_tensor_time_step=False)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Randomly generates observations following the given observation_spec.

If an action_spec is provided it validates that the actions used to step the environment fall within the defined spec.

Initializes the environment.

Parameters

observation_spec (nested TensorSpec) – tensor spec for observations
action_spec (nested TensorSpec) – tensor spec for actions.
env_id (int) – (optional) ID of the environment.
episode_end_probability (float) – Probability an episode will end when the environment is stepped.
discount (float) – Discount to set in time_steps.
reward_fn (Callable) – Callable that takes in step_type, action, an observation(s), and returns a tensor of rewards.
batch_size (int) – (Optional) Number of observations generated per call. If this value is not None, then all actions are expected to have an additional major axis of size batch_size, and all outputs will have an additional major axis of size batch_size.
seed (int) – Seed to use for rng used in observation generation.
render_size (tuple of ints) – Size of the random render image to return when calling render.
min_duration (int) – Number of steps at the beginning of the episode during which the episode can not terminate.
max_duration (int) – Optional number of steps after which the episode terminates regarless of the termination probability.
use_tensor_time_step (bool) – convert all quantities in time_step to torch.tensor if True. Otherwise use numpy data types.

Raises

ValueError – If batch_size argument is not None and does not match the
shapes of discount or reward. –

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

alf.environments.suite_babyai#

class BabyAIWrapper(env, max_instruction_length=80, mode='sent')[source]#

Bases: gym.core.Wrapper

A wrapper for BabyAI environment.

BabyAI environment is introduced in Chevalier-Boisver et al. Baby{AI}: First Steps Towards Grounded Language Learning With a Human In the Loop.

It can be downloaded from https://github.com/mila-iqia/babyai

Parameters

gym_env (gym.Env) – An instance of OpenAI gym environment.
max_instruction_length (int) – the maximum number of words of an instruction.
mode (str) – one of (‘sent’, ‘word’, ‘char’). If ‘sent’, the whole instruction (word ID array) is given in the observation at every step. If ‘word’, the word IDs are given in the observation sequentially. Each step only one word ID is given. A zero is given for every steps after all the word IDs are given. If ‘char’, similar to ‘word’, but only one character is given at each step. For ‘char’ mode, we assume that the unicode of each character is within [0, 127].

VOCAB = ['then', 'after', 'you', 'and', 'go', 'to', 'pick', 'up', 'open', 'put', 'next', 'door', 'ball', 'box', 'key', 'on', 'your', 'left', 'right', 'in', 'front', 'of', 'you', 'behind', 'red', 'green', 'blue', 'purple', 'yellow', 'grey', 'the', 'a']#

VOCAB_SIZE = 33#

reset(**kwargs)[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

is_available()[source]#

load(environment_name, env_id=None, max_instruction_length=80, mode='sent', discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#

Loads the selected environment and wraps it with the specified wrappers.

Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.

Parameters

environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
max_instruction_length (int) – the maximum number of words of an instruction.
mode (str) – one of (‘sent’, ‘word’, ‘char’). If ‘sent’, the whole instruction (word ID array) is given in the observation at every step. If ‘word’, the word IDs are given in the observation sequentially. Each step only one word ID is given. A zero is given for every steps after all the word IDs are given. If ‘char’, similar to ‘word’, but only one character is given at each step.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to the default step limit defined in the environment’s spec. No limit is applied if set to 0 or if there is no max_episode_steps set in the environment’s spec.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.

Returns

An AlfEnvironment instance.

alf.environments.suite_bsuite#

class BSuiteWrapper(env)[source]#

Bases: bsuite.utils.gym_wrapper.GymFromDMEnv

A wrapper for Bsuite environment.

The BSuite environment is introduced in Osband et al. Behaviour Suite for Reinforcement Learning.

It can be accessed on https://github.com/deepmind/bsuite

Parameters: gym_env (gym.Env) – An instance of OpenAI gym environment.

property observation_space: gym.spaces.box.Box#

Return type: Box

reset()[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

is_available()[source]#

load(environment_name='cartpole_swingup/0', env_id=None, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#

Loads the selected environment and wraps it with the specified wrappers.

Note that by default a TimeLimit wrapper is used in wrap_env to limit episode lengths to the default benchmarks defined by the registered environments.

Parameters

environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to zero as not all bsuite environments specify max episode lengths. No limit is applied if set to 0.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.

Returns

An AlfEnvironment instance.

alf.environments.suite_carla#

CarlaEnvironment suite.

To use this, there are two ways:

Run the code within docker image horizonrobotics/alf:0.0.3-carla Both Docker and Nvidia-Docker2 need to be installed.
Install carla:

wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/CARLA_0.9.9.tar.gz
mkdir carla
tar zxf CARLA_0.9.9.tar.gz -C carla
cd carla/Import
wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/AdditionalMaps_0.9.9.tar.gz
cd ..
./ImportAssert.sh
easy_install PythonAPI/carla/dist/carla-0.9.9-py3.7-linux-x86_64.egg
pip install networkx==2.2

Make sure you are using python3.7

class CarlaEnvironment(batch_size, map_name, vehicle_filter='vehicle.*', walker_filter='walker.pedestrian.*', num_other_vehicles=0, num_walkers=0, percentage_walkers_running=0.1, percentage_walkers_crossing=0.1, global_distance_to_leading_vehicle=2.0, use_hybrid_physics_mode=True, safe=True, day_length=0.0, max_weather_length=0, weather_transition_ratio=0.1, step_time=0.05)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Carla simulation environment.

In order to use it, you need to either download a valid docker image or a Carla package.

Parameters

batch_size (int) – the number of learning vehicles.
map_name (str) – the name of the map (e.g. “Town01”)
vehicle_filter (str) – the filter for getting the blueprints for training vehicles. The filter for other vehicles will always be obtained using ‘vehicle.*’.
walker_filter (str) – the filter for getting walker blueprints.
num_other_vehicles (int) – the number of autopilot vehicles
num_walkers (int) – the number of walkers
global_distance_to_leading_vehicle (str) – the autopiloted vehicles will try to keep such distance from other vehicles.
percentage_walkers_running (float) – percent of running walkers
percentage_walkers_crossing (float) – percent of walkers walking across the road.
use_hybrid_physics_mode (bool) – If true, the autopiloted vehicle will not use physics for simulation if it is far from other vehicles.
safe (bool) – avoid spawning vehicles prone to accidents.
day_length (float) – number of seconds of a day. If 0, the time of the day will not change.
max_weather_length (float) – the number of seconds each weather will last at the most. The actual lasting time (actual_weather_length) of each randomized weather setting is randomly sampled from [0.25 * max_weather_length, max_weather_length]. If max_weather_length is set to 0, the weather won’t change. Otherwise, weather randomization is turned on and we will sample a new set of parameters after reaching actual_weather_length for each sampled weather. Note that we exclude sun_azimuth_angle and sun_altitude_angle from weather randomization and they are controlled separately by day_length in a more realistic way.
weather_transition_ratio (float) – the ratio between the length of the weather transtion part and the actual lasting time of the new weather including the transition phase. It has no effect if max_weather_length is 0.
step_time (float) – how many seconds does each step of simulation represents.

action_desc()[source]#

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_desc()[source]#

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property players#

Get all the players in the environment.

Returns
Return type: list[Player]

render(mode)[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

vehicles_with_functioning_lights = ['vehicle.audi.tt', 'vehicle.chevrolet.impala', 'vehicle.dodge_charger.police', 'vehicle.audi.etron', 'vehicle.lincoln.mkz2017', 'vehicle.mustang.mustang', 'vehicle.tesla.model3', 'vehicle.volkswagen.t2']#

class CarlaServer(rpc_port=2000, streaming_port=2001, docker_image='horizonrobotics/alf:0.0.6-carla0.9.9', quality_level='Low', carla_root='/home/carla', use_opengl=True)[source]#

Bases: object

CarlaServer for doing the simulation.

Parameters

rpc_port (int) – port for RPC
streaming_port (int) – port for data streaming
docker_image (str) – If provided, will use the docker image to start the Carla server. Some valid images are “carlasim/carla:0.9.9” and “horionrobotics/alf:0.0.3-carla”
quality_level (str) – one of [‘Low’, ‘Epic’]. See the explanation at https://carla.readthedocs.io/en/latest/adv_rendering_options/#graphics-quality
carla_root (str) – directorcy where CarlaUE4.sh is in. The default value is correct for using docker image. If not using docker image, make sure you provide the correct path. This is the directory where you unzipped the file you downloaded from https://github.com/carla-simulator/carla/releases/tag/0.9.9.
use_opengl (bool) – the default graphics engine of Carla is Vulkan, which is supposed to be better than OpenGL. However, Vulkan is not always available. It may not be installed or the nvidia driver does not support vulkan.

stop()[source]#: Stop the carla server.

class Player(actor, alf_world, controller_ctor=None, success_reward=100.0, success_distance_thresh=5.0, max_collision_penalty=20.0, max_stuck_at_collision_seconds=5.0, stuck_at_collision_distance=1.0, max_red_light_penalty=10.0, overspeed_penalty_weight=0.0, sparse_reward=False, sparse_reward_interval=10.0, allow_negative_distance_reward=True, min_speed=5.0, additional_time=0.0, with_gnss_sensor=True, with_imu_sensor=True, with_camera_sensor=True, with_radar_sensor=True, with_bev_sensor=False, with_dynamic_object_sensor=False, data_collection_mode=False, with_red_light_sensor=False, with_obstacle_sensor=False, terminate_upon_infraction='', render_waypoints=True)[source]#

Bases: object

Player is a vehicle with some sensors.

An episode terminates if it reaches one of the following situations: 1. the vehicle arrives at the goal. 2. the time exceeds route_length / min_speed + additional_time. 3. it get stuck because of a collision.

At each step, the reward is given based on the following components: 1. Arriving goal: success_reward 2. Moving in the navigation direction: the number of meters moved

This moving reward can be either dense of sparse depending on the argument sparse_reward.

Negative reward caused by collision: -min(max_collision_reward, max(epside_reward, 0))

Currently, the player has these sensors: CollisionSensor, GnssSensor, IMUSensor, CameraSensor, BEV_sensor, LaneInvasionSensor, RadarSensor, NavigationSensor. See the documentation for these class for the definition the data generated by these sensors.

Parameters

actor (carla.Actor) – the carla actor object
alf_world (Wolrd) – the world containing the player
controller_ctor (Callable|None) – if provided, will be as controller_ctor(vehicle, step_time) to create a vehicle controller. It will be used to process the action and generate the control.
success_reward (float) – the reward for arriving the goal location.
success_distance_thresh (float) – success is achieved if the current location is with such distance of the goal
max_collision_penalty (float) – the maximum penalty (i.e. negative reward) for collision. We don’t want the collision penalty to be too large if the player cannot even get enough positive moving reward. So the penalty is capped at Player.PENALTY_RATE_COLLISION * max(0., episode_reward)). Note that this reward is only given once at the first step of contiguous collisions.
max_stuck_at_collision_seconds (float) – the episode will end and is considerred as failure if the car is stuck at the collision for so many seconds,
stuck_at_collision_distance (float) – the car is considerred as being stuck at the collision if it is within such distance of the first collision location.
max_red_light_penalty (float) – the maximum penalty (i.e. negative reward) for red light violation. We don’t want the red light penalty to be too large if the player cannot even get enough positive moving reward. So the penalty is capped at Player.PENALTY_RATE_RED_LIGHT * max(0., episode_reward)). Note that this reward is only given once at the first step of contiguous red light violation.
overspeed_penalty_weight (float) – if > 0, a penalty proportional to the overspeed magnitude will be applied, multiplied by the step time (seconds each step of simulation represents) to make the penalty invariant to it, and then multiplied by the weight of overspeed_penalty_weight. A negative value is the same as 0.
sparse_reward (bool) – If False, the distance reward is given at every step based on how much it moves along the navigation route. If True, the distance reward is only given after moving sparse_reward_distance.
sparse_reward_interval (float) – the sparse reward is given after approximately every such distance along the route has been driven.
allow_negative_distance_reward (True) – whether to allow negative distance reward. If True, the agent will receive positive reward for moving ahead along the route, and negative reward for moving back along the route. If False, the agent still receives positive reward for moving ahead along the route, but will not receive negative reward for moving back along the route. Instead, the negative distance will be accumulated to the future distance reward. This may ease the learning if the right behavior is to temporarily go back along the route in order, for examle, to avoid obstacle.
min_speed (float) – unit is m/s. Failure if route_length / min_speed + additional_time seconds passed
additional_time (float) – additional time (unit is second) provided to the agent in each episode. This is useful especially for the episodes with short route_lengths (e.g. < 50m), as it takes some time for the car to be able to move (because of initial spawning phase with z > 0 and acceleration phase).
with_gnss_sensor (bool) – whether to use GnssSensor.
with_imu_sensor (bool) – whether to use IMUSensor.
with_camera_sensor (bool) – whether to use CameraSensor.
with_radar_sensor (bool) – whether to use RadarSensor.
with_bev_sensor (bool) – whether to use BEVSensor.
data_collection_mode (bool) – if True, will use Rule-based agents to control the Players. This can be used for purposes such as collecting data.
with_red_light_sensor (bool) – whether to use RedlightSensor.
with_obstacle_sensor (bool) – whether to use ObstacleDetectionSensor.
terminate_upon_infraction (str) – whether to terminate the episode based on the specified mode (“collision”, “redlight”, “all”, “”), when the agent has the corresponding infractions. If “”, no infraction-based termination is activated.
render_waypoints (bool) – whether to render (interpolated) waypoints in the generated video during rendering. Note that it is only used for visualization and has no impacts on the perception data.

PENALTY_RATE_COLLISION = 0.5#

PENALTY_RATE_RED_LIGHT = 0.3#

REWARD_COLLISION = 2#

REWARD_DIMENSION = 6#

REWARD_DISTANCE = 1#

REWARD_OVERALL = 0#

REWARD_OVERSPEED = 5#

REWARD_RED_LIGHT = 4#

REWARD_SUCCESS = 3#

act(action)[source]#

Generate the carla command for taking the given action.

Use carla.Client.apply_batch_sync() to actually destroy the sensor.

Parameters: action (nested np.ndarray) –
Returns
Return type: list[carla.command]

action_desc()[source]#

Get the description about the action.

Returns: each str corresponds to one TensorSpec from action_spec().
Return type: nested str

action_spec()[source]#

Get the action spec.

If controller is provided at __init__(), the action_spec is given by controller.

Otherwise, the action is a 4-D vector of [throttle, steer, brake, reverse], where throttle is in [-1.0, 1.0] (negative value is same as zero), steer is in [-1.0, 1.0], brake is in [-1.0, 1.0] (negative value is same as zero), and reverse is interpreted as a boolean value with values greater than 0.5 corrsponding to True.

Returns
Return type: nested BoundedTensorSpec

destroy()[source]#

Get the commands for destroying the player.

Use carla.Client.apply_batch_sync() to actually destroy the sensor.

Returns
Return type: list[carla.command]

get_current_time_step(current_frame)[source]#

Get the current time step for the player.

Parameters: current_frame (int) – current simulation frame no.
Returns: all elements are np.ndarray or np.number.
Return type: TimeStep

get_overspeed_amount()[source]#

Get the difference between the actor’s speed and the speed limit, lower bounded by 0. :returns:

if actor’s _speed_limit is None or speed is lower than

speed limit

the amount of the actor’s speed over the speed limit otherwise

Return type: float

info_spec()[source]#: Get the info spec.

observation_desc()[source]#

Get the description about the observation.

Returns: each str corresponds to one TensorSpec from observatin_spec().
Return type: nested str

observation_spec()[source]#

Get the observation spec.

Returns
Return type: nested TensorSpec

render(mode)[source]#

Render the simulation.

Parameters

mode (str) – one of [‘rgb_array’, ‘human’]

Returns

None: if mode is ‘human’
np.ndarray: the image of shape [height, width, channeles] if
mode is ‘rgb_array’

Return type

one of the following

reset()[source]#

Reset the player location and goal.

Use carla.Client.apply_batch_sync() to actually reset.

Returns
Return type: list[carla.command]

reward_spec()[source]#: Get the reward spec.

update_speed_limit(dis_threshold=10)[source]#

Update the speed limit of the actor according to the active speed limit sign. The speed limit is updated when passing by a speed limit sign.

Parameters: dis_threshold (float) – the distance in meter within which to consider the speed limit sign as active. The one closest to the actor in the active set will be used as the current speed limit. If a negative value is provided, all speed limit signs are taken into considerations for determining the closest one.
Returns: speed limit in m/s
Return type: float

class WeatherParameters(cloudiness=0, precipitation=0, precipitation_deposits=0, wind_intensity=0, fog_density=0, fog_distance=0)[source]#

Bases: object

A class for a set of weather related parameters. Currently it contains all the weather fields from carla.WeatherParameters except for sun_azimuth_angle and sun_altitude_angle, which are controlled separately by day_length in a more realistic way.

get_weather_fields()[source]#

Get the list of configurable weather fields

Returns: A list of strings, each as the name of a configurable field

adjust_weather_parameters(weather_param, delta)[source]#

Adjust the parameters of weather_param according to the fields in WeatherParameters. The value is adjusted by adding the field value of delta to weather_param.

Parameters

weather_param (carla.WeatherParameters) – a carla.WeatherParameters instance containing the parameters to be adjusted
delta (WeatherParameters) – an instance of WeatherParameters with the value of each field representing the amount to be adjusted

Returns

The input weather_param instance with adjusted field values.

extract_weather_parameters(weather_param)[source]#: Extract the parameters according to the fields in WeatherParameters and use them to construct an instance of WeatherParameters.

is_available()[source]#

load(map_name, batch_size, wrappers=[])[source]#

Load CarlaEnvironment

Parameters

map_name (str) – name of the map. Currently available maps are: ‘Town01, Town02’, ‘Town03’, ‘Town04’, ‘Town05’, ‘Town06’, ‘Town07’, and ‘Town10HD’
batch_size (int) – the number of vehicles in the simulation.
wrappers (list[AlfEnvironmentBaseWrapper]) – environment wrappers

Returns

CarlaEnvironment

alf.environments.suite_dmc#

is_available()[source]#: Check if the required environment is installed.

load(environment_name='cheetah:run', from_pixels=True, image_size=100, env_id=None, discount=1.0, visualize_reward=False, max_episode_steps=1000, control_timestep=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#

Load a MuJoCo environment.

For installation of DMControl, see https://github.com/deepmind/dm_control. For installation of MuJoCo210, see https://mujoco.org.

Parameters

environment_name (str) – this string must have the format “domain_name:task_name”, where “domain_name” is defined by DM control as the physical model name, and “task_name” is an instance of the model with a parcular MDP structure.
from_pixels (boolean) – Output image if set to True.
image_size (int) – The height and width of the output image from the environment.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
visualize_reward – if True, then the rendered frame will have a highlighted color when the agent achieves a reward.
max_episode_steps (int) – The maximum episode step in the environment.
control_timestep (float) – the time duration between two agent actions. If this is greater than the agent’s primitive physics timestep, then multiple physics simulation steps might be performed between two actions. The difference between multi-physics steps and “action repeats”/FrameSkip is that the intermediate physics step won’t need to render an observation (which might save time if rendering is costly). However, this also means that unlike “action repeats”/FrameSkip which accumulates rewards of several repeated steps, only a single-step reward is obtained after all the physics simulation steps are done. The total number of physics simulation steps in an episode is control_timestep / physics_timestep * frame_skip * max_episode_steps. If None, the default control timstep defined by DM control suite will be used.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment. There will be an AlfEnvironmentDMC2GYMWrapper added before any alf_wrappers.

Returns

A wrapped AlfEnvironment

alf.environments.suite_dmlab#

class DeepmindLabEnv(scene, action_repeat=4, observation='RGB_INTERLEAVED', config={}, renderer='hardware')[source]#

Bases: gym.core.Env

Create an deepmind_lab env

Parameters

scene (str) – script for the deepmind_lab env. See available script: https://github.com/deepmind/lab/tree/master/game_scripts/levels
action_repeat (int) – the interval at which the agent experiences the game
observation (str) – observation format. See doc about the available observations: https://github.com/deepmind/lab/blob/master/docs/users/python_api.md
config (dict) – config for env
renderer (str) – ‘software’ or ‘hardware’. If set to ‘hardware’, EGL or GLX is used for rendering. Make sure you have GPU if you use ‘hardware’.

close()[source]#

Override close in your subclass to perform any necessary cleanup.

Environments will automatically close() themselves when garbage collected or when the program exits.

metadata = {'render.modes': ['rgb_array']}#

render(mode='rgb_array', close=False)[source]#

Renders the environment.

The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:

human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).

Note

Make sure that your class’s metadata ‘render.modes’ key includes: the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.

Parameters: mode (str) – the mode to render with

Example:

class MyEnv(Env):

metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}

def render(self, mode=’human’):

if mode == ‘rgb_array’:: return np.array(…) # return RGB frame suitable for video
elif mode == ‘human’:: … # pop up a window and render
else:: super(MyEnv, self).render(mode=mode) # just raise an exception

reset()[source]#

Resets the state of the environment and returns an initial observation.

Returns: the initial observation.
Return type: observation (object)

seed(seed=None)[source]#

Sets the seed for this env’s random number generator(s).

Note

Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.

Returns

Returns the list of seeds used in this env’s random: number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.

Return type

list<bigint>

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

action_discretize(action_spec, look_left_right_pixels_per_frame=(- 20, 20), look_down_up_pixels_per_frame=(- 10, 10), strafe_left_right=(- 1, 1), move_back_forward=(- 1, 1), fire=(), jump=(1), crouch=(1), **kwargs)[source]#

Discretize action from action_spec

TODO: action combinations

Mapping all valid action values to discrete action

original deepmind lab environment action_spec:

[{'max': 512, 'min': -512, 'name': 'LOOK_LEFT_RIGHT_PIXELS_PER_FRAME'},
{'max': 512, 'min': -512, 'name': 'LOOK_DOWN_UP_PIXELS_PER_FRAME'},
{'max': 1, 'min': -1, 'name': 'STRAFE_LEFT_RIGHT'},
{'max': 1, 'min': -1, 'name': 'MOVE_BACK_FORWARD'},
{'max': 1, 'min': 0, 'name': 'FIRE'},
{'max': 1, 'min': 0, 'name': 'JUMP'},
{'max': 1, 'min': 0, 'name': 'CROUCH'}]

and discretized actions:

0  -> [20,0,0,0,0,0,0] (look left 20 pixels),
1  -> [-20,0,0,0,0,0,0] (look right 20 pixels),
...,
m  -> [0,0,0,-1,0,0,0] (move back),
m+1-> [0,0,0,1,0,0,0] (move forward) ,
...,
n  -> [0,0,0,0,1,1,0] (jump and fire),
...

see SuiteDMLabTest.test_action_discretize in suite_dmlab_test.py for examples

Parameters

action_spec (list(dict)) – action spec
look_left_right_pixels_per_frame (iterable|str) – look left or look right pixels
look_down_up_pixels_per_frame (iterable|str) – look down or look up pixels
strafe_left_right (iterable|str) – strafe left or strafe right
move_back_forward (iterable|str) – move back or move forward
fire (iterable|str) – fire values
jump (iterable|str) – jump values
crouch (iterable|str) – crouch values
kwargs (dict) – other config for actions

Returns

discrete actions

Return type

actions (list[numpy.array])

is_available()[source]#

load(scene, env_id=None, discount=1.0, frame_skip=4, gym_env_wrappers=(), alf_env_wrappers=(), wrap_with_process=False, max_episode_steps=None)[source]#

Load deepmind lab envs. :param scene: script for the deepmind_lab env. See available script:

https://github.com/deepmind/lab/tree/master/game_scripts/levels

Parameters

env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
frame_skip (int) – the frequency at which the agent experiences the game
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
wrap_with_process (bool) – Whether wrap env in a process
max_episode_steps (int) – max episode step limit

Returns

An AlfEnvironment instance.

alf.environments.suite_go#

GoEnvironment.

class GoBoard(batch_size, height, width, max_num_moves, num_previous_boards=10)[source]#

Bases: object

This implements Go board.

This class only takes care how the board changes when a valid move is given. Other go rules are handled by GoEnvironment

We maintain the following data and incrementally update them:

_board: the current board of shape [B, H, W]. At each position, 0 means
it is empty, -1 means a stone of player 0, 1 means a stone of player 1. The board is padded with 2 on four sides to make the handling of boundary simpler.
_cc_id: the connected component (CC) which each position belongs to. The
shape is [B, H, W].
_cc_qi: the qi (liberty) of each CC. The shape is [B, max_num_ccs].
_num_ccs: the number of CCs.

Note that the qi is different from the common definition of qi. For example. in the following board, the qi of the connected component “o” is 4 in our data structure because position (1, 1) is counted adjacent to (1, 0) and (0, 1) and is counted twice towards the qi of “o”. While using the common definition of qi, the liberty of “o” is 3. We use this different way of calculating qi so that code can be simplified.

  0123
 ------
0|oo  |
1|o   |
2|    |
3|    |
 ------

Parameters

batch_size (int) – the number of parallel boards
height (int) – height of each board
width (int) – width of each board
max_num_moves (int) – maximum number of moves allowed
num_previous_boards (int) – previous so many board situation will be stored. They will be used by classify_all_moves() to check whether a move will lead to board situation same as one of these previous board situations.

calc_area(board_indices=None)[source]#

Calculate the area of each player.

In order for a position to be considered to be owned by a player, it has to be either the player’s stone or cannot be reached by the opponent’s stones. With this definition of area, players have to play until all the dead stones have been taken out. This shouldn’t change how the game is played. This is the so called Tromp-Taylor rules

Parameters: board_indices (Tensor) – int64 Tensor to indicate the boards
Returns: area for player 0 and player 1
Return type: tuple (Tensor, Tensor)

calc_area_simple(board_indices=None)[source]#

Calculate the area of each player.

In order for a position to be considered to be owned by a player, it has to be either the player’s stone or fully surrounded by the player’s stone. With this definition of area, players have to play until the board is full except the eyes of only one position. This shouldn’t change how the game is played.

Parameters: board_indices (Tensor) – int64 Tensor to indicate the boards
Returns: area for player 0 and player 1
Return type: tuple (Tensor, Tensor)

classify_all_moves(player, board_indices=None)[source]#

Classify all the moves on the board.

This function will examine all possible moves except PASS and annotate them using 3 boolean attributes: occupied, suicidal, and repeated.

Parameters

player (Tensor) – int8 Tensor to indicate which player to consider.
board_indices (Tensor) – int64 Tensor to indicate the boards

Returns

each one is a bool Tensor of shape [B, height, width]. - occupied: occupied[b, y, x] means whether a move at (y, x) overlapped

with existing stone on the board[b]

suicidal: suicidal[b, y, x] means whether a move at (y, x) is a
suicidal move for player[b] on board[b]
repeated: repeated[b, y, x] means whether a move at (y, x) by player[b]
will result in a board same as one of the previous boards of board[b].

Return type

tuple

get_board(board_indices=None)[source]#

Get the current board.

Parameters: board_indices (Tensor) – int64 Tensor to indicate the boards
Returns: int8 Tensor of the shape [B, height, width].
Return type: Tensor

reset_board(board_indices=None)[source]#

Reset the board to initial condition.

Parameters: board_indices (Tensor) – int64 Tensor to indicate the boards

update(board_indices, y, x, player)[source]#

Update the board for given move at (y, x).

It assumes the move is at an empty location.

Parameters

board_indices (Tensor) – int64 Tensor to indicate which boards to update.
y (Tensor) – int64 Tensor of the same shape as board_indices to indicate the y coordinate of the move
x (Tensor) – int64 Tensor of the same shape as board_indices to indicate the x coordinate of the move
player (Tensor) – int8 Tensor of the same shape as board_indices to indicate which player make the move

Returns

bool Tensor with the same size as board_indices. It indicates: whether the move for each board is suicidal (i.e., making the qi of the player 0). Note that suicidal move may change the board because all the stones of the player which are connected to the suicidal move will be removed.

Return type

Tensor

class GoEnvironment(batch_size, height=19, width=19, winning_thresh=7.5, allow_suicidal_move=False, reward_shaping=False, human_player=None)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Go environment.

The game plays until one of the following events happen:

Both player pass. In this case, the area of each player will be calculated

and the reward is 1 if player 0 win, -1 if player 1 win. When calculating the area, in order for a position to be considered to be owned by a player, it has to be either the player’s stone or cannot be reached by the opponent’s stones. With this definition of area, players have to play until all the dead stones have been taken out. This shouldn’t change how the game is played. This is the so called Tromp-Taylor rules

An invalid move. The opponent will get reward, which means that if player 0 make an invalid move, the reward is -1. If player 1 make an invalid move, the reward is 1. There are two types of invalid moves: a. a move to position which is already occupied. b. a move which leads to a board exactly same as the previous board.
The total number of moves exceeds max_num_moves. This is considered as both passing. max_num_moves is set to 2 * height * width.

The observation is an OrderedDict containing three fields:

board: a [batch_size, 1, height, width] int8 Tensor, with 0 indicating empty location, -1 indicating a stone of player 0 and 1 indicating a stone of player 1
to_play: a [batch_size] int8 Tensor indicating who is going to make the next move. Its value is either 0 or 1
prev_action: a [batch_size] int64 Tensor indicating the action taken by the previous player. This is pass action for the first step.

The action is an int64 scalar. If it is smaller than height*width, it means to play the stone at (action // width, action % width). If it is equal to height * width, it means to pass for this round.

Parameters

batch_size (int) – the number of parallel boards
height (int) – height of each board
width (int) – width of each board
winning_thresh (float) – player 0 wins if area0 - area1 > winning_thresh, lose if area0 - area1 < winning_thresh, otherwise draw.
allow_suicidal_move (bool) – whether suicidal move is allowed.
reward_shaping (bool) – if True, instead of using +1,-1 as reward, use alf.math.softsign(area0 - area1 - winning_thresh) as reward to encourage capture more area.
human_player (int|None) – 0, 1 or None

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

env_info_spec()[source]#: Defines the env_info provided by the environment.

metadata = {'render.modes': ['human', 'rgb_array'], 'video.frames_per_second': 1}#

observation_desc()[source]#

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode)[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

load(name='', batch_size=1)[source]#

Load GoEnvironment.

Parameters

name (str) – not used
Args – batch_size (int): the number of parallel boards

Returns

GoEnvironment

alf.environments.suite_gym#

load(environment_name, env_id=None, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=(), image_channel_first=True)[source]#

Loads the selected environment and wraps it with the specified wrappers.

Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.

Parameters

environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to the default step limit defined in the environment’s spec. No limit is applied if set to 0 or if there is no max_episode_steps set in the environment’s spec.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
image_channel_first (bool) – whether transpose image channels to first dimension.

Returns

An AlfEnvironment instance.

wrap_env(gym_env, env_id=None, discount=1.0, max_episode_steps=0, gym_env_wrappers=(), time_limit_wrapper=<class 'alf.environments.alf_wrappers.TimeLimit'>, normalize_action=True, clip_action=True, alf_env_wrappers=(), image_channel_first=True, auto_reset=True)[source]#

Wraps given gym environment with AlfGymWrapper.

Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.

Also note that all gym wrappers assume images are ‘channel_last’ by default, while PyTorch only supports ‘channel_first’ image inputs. To enable this transpose, ‘image_channel_first’ is set as True by default. gym_wrappers.ImageChannelFirst is applied after all gym_env_wrappers and before the AlfGymWrapper.

Parameters

gym_env (gym.Env) – An instance of OpenAI gym environment.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – Used to create a TimeLimitWrapper. No limit is applied if set to 0. Usually set to gym_spec.max_episode_steps as done in load. Note that a ``TimeLimit` wrapper will be applied as the last Gym wrapper, so if you also use the FrameSkip Gym wrapper, then the actual max length of an episode will be skip*max_episode_steps.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
time_limit_wrapper (AlfEnvironmentBaseWrapper) – Wrapper that accepts (env, max_episode_steps) params to enforce a TimeLimit. Usually this should be left as the default, alf_wrappers.TimeLimit.
normalize_action (bool) – if True, will scale continuous actions to [-1, 1] to be better used by algorithms that compute entropies.
clip_action (bool) – If True, will clip continuous action to its bound specified by action_spec. If normalize_action is also True, this clipping happens after the normalization (i.e., clips to [-1, 1]).
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
image_channel_first (bool) – whether transpose image channels to first dimension. PyTorch only supports channgel_first image inputs.
auto_reset (bool) – If True (default), reset the environment automatically after a terminal state is reached.

Returns

An AlfEnvironment instance.

alf.environments.suite_highway#

Suite for loading highway environments. Installation: pip install git+https://github.com/eleurent/highway-env

class ActionScalarization(env)[source]#

Bases: gym.core.Wrapper

Convert action to scalar if the current action space is MetaDiscreteAction and type of the input action is np.ndarray

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

class FlattenObservation(env, fields=None)[source]#

Bases: alf.environments.gym_wrappers.BaseObservationWrapper

Flatten the 2D observations into a 1D vector

Parameters

env (gym.Env) – the gym environment
fields (list[str]) – fields to be applied transformation, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is transformed

transform_observation(observation)[source]#

Transform observation

Subclass should implement this to perform transformation

Parameters: observation (ndarray) – observation to be transformed
Returns: transformed space

transform_space(observation_space)[source]#

Transform space

Subclass should implement this to perform transformation

Parameters: observation_space (gym.Space) – space to be transformed
Returns: transformed space

class RemoveActionEnvInfo(env)[source]#

Bases: gym.core.Wrapper

Remove action from EnvInfo if exist

step(action)[source]#

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Parameters: action (object) – an action provided by the agent
Returns: agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Return type: observation (object)

is_available()[source]#

load(environment_name, env_id=None, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=(), env_config=None)[source]#

Loads the selected environment and wraps it with the specified wrappers.

Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.

Parameters

environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None or 0 the max_episode_steps will be set to the default step limit defined in the environment. Otherwise max_episode_steps will be set to the smaller value of the two.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
env_config (dict|None) – a dictionary for configuring some aspects of the environment. If is None, the default configuration will be used. Please refer to the default_env_config below for an example config and the doc for more details: https://highway-env.readthedocs.io/en/latest/user_guide.html

Returns

An AlfEnvironment instance.

alf.environments.suite_mario#

is_available()[source]#

load(game, env_id=None, state=None, discount=1.0, wrap_with_process=False, frame_skip=4, record=False, crop=True, gym_env_wrappers=(), alf_env_wrappers=(), max_episode_steps=4500)[source]#

Loads the selected mario game and wraps it . :param game: Name for the environment to load. :type game: str :param env_id: (optional) ID of the environment. :type env_id: int :param state: game state (level) :type state: str :param wrap_with_process: Whether wrap env in a process :type wrap_with_process: bool :param discount: Discount to use for the environment. :type discount: float :param frame_skip: the frequency at which the agent experiences the game :type frame_skip: int :param record: Record the gameplay , see retro.retro_env.RetroEnv.record

False for not record otherwise record to current working directory or specified director

Parameters

crop (bool) – whether to crop frame to fixed size
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
max_episode_steps (int) – max episode step limit

Returns

An AlfEnvironment instance.

alf.environments.suite_metadrive#

class AlfMetaDriveWrapper(metadrive_env, env_id=0)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Wrapper over the MetaDrive autonomous driving environment. You will need to have metadrive installed as a dependency to use this.

Constructor of AlfMetaDriveWrapper. :type metadrive_env: MetaDriveEnv :param metadrive_env: the original meta drive environment being wrapped.

The meta drive environment should be properly configured on its own before being wrapped.

Parameters: env_id (int) – the ID of this environment when appear as part of a batched environment.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode)[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

seed(seed=None)[source]#

Reset the underlying MetaDrive environment with a specified seed.

MetaDrive uses a slightly different mechanism for seeds. Upon construction of a MetaDrive environment, the user needs to specify a seed range [start_seed, start_seed + scenario_num]. When being forced to reset with a specific seed, that seed must be within the predefined range.

Parameters: seed (Optional[int]) – the seed that the environment will be reset with. If it is specified as None, a random seed within the range will be selected by the underlying MetaDrive environment.

load(env_name='Vectorized', env_id=0, traffic_density=0.1, start_seed=3844, scenario_num=5000, decision_repeat=5, map_spec=4, crash_penalty=5.0, speed_reward_weight=0.1, success_reward=10.0, time_limit=1200)[source]#

Load the MetaDrive environment and wraps it with AlfMetaDriveWrapper. :type env_name: str :param env_name: Used to specify whether the environment produces observation

in vectorized form or raster (Bird Eye View) form. The user is only allowed to specify “Vectorized” or “BirdEye”.

Parameters

env_id (int) – (optional) ID of the environment.
traffic_density (float) – number of traffic vehicles per 10 meter per lane.
start_seed (int) – random seed of the first map.
scenario_num (int) – specifies the range of the scenario seeds together with start_seed. When being reset, a seed will be picked randomly from [start_seed, start_seed + scenario_num]. Note that even with the same seed, the generated map can vary as there are other randomness such as “random lane number”.
decision_repeat (int) – how many times for the simulation engine to repeat the applied action to the vehicles. The minimal simulation interval physics_world_step_size is 0.02 s. Therefore each RL step will last decision_repeat * 0.02 s in the simulation world.
map_spec (Union[int, str]) – User can set a string or int as the key to generate map in an easy way. For example, config[“map”] = 3 means generating a map containing 3 blocks, while config[“map”] = “SCrRX” means the first block is Straight, and the following blocks are Circular, InRamp, OutRamp and Intersection. The character here are the unique ID of different types of blocks as shown in the next table. Therefore using a string can determine the block type sequence. Detailed list of block types can be found at https://metadrive-simulator.readthedocs.io/en/latest/config_system.html
crash_penalty (float) – the immediate penalty when the car hits the road boundary, cars or other objects. It should be a positive number.
speed_reward_weight (float) – at each step, the incentive reward for being at a high speed is this weight * the speed in km/h.
success_reward (float) – the amount of reward will be given (at most 1 time per episode) when the ego car reaches the destination.
time_limit (int) – the environment will terminate the an episode if it goes beyond this number of steps.

alf.environments.suite_procgen#

load(env_name, batch_size=1)[source]#

Load the Procgen environment

Parameters

env_name (str) – the name of the procgen environment, such as ‘goldrun’, ‘bossfight’, etc.
batch_size (int) – the number of parallel environments to run simultaneously.

alf.environments.suite_robotics#

alf.environments.suite_safety_gym#

alf.environments.suite_simple#

Suite for simple environments defined by ALF

load(game, env_id=None, env_args={}, discount=1.0, frame_skip=None, frame_stack=None, gym_env_wrappers=(), alf_env_wrappers=(), max_episode_steps=0)[source]#

Loads the specified simple game and wraps it. :param game: name for the environment to load. The game should have been

defined in the sub-directory ./simple/.

Parameters

env_args (dict) – extra args for creating the game.
discount (float) – discount to use for the environment.
frame_skip (int) – the time interval at which the agent experiences the game.
frame_stack (int) – stack so many latest frames as the observation input.
gym_env_wrappers (list) – list of gym env wrappers.
alf_env_wrappers (list) – list of ALF env wrappers.
max_episode_steps (int) – max number of steps for an episode.

Returns

An AlfEnvironment instance.

alf.environments.suite_socialbot#

is_available()[source]#

load(environment_name, env_id=None, port=None, wrap_with_process=False, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#

Loads the selected environment and wraps it with the specified wrappers.

Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.

Parameters

environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
port (int) – Port used for the environment
wrap_with_process (bool) – Whether wrap environment in a new process
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to the default step limit defined in the environment’s spec. No limit is applied if set to 0 or if there is no timestep_limit set in the environment’s spec.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.

Returns

An AlfEnvironmentBase instance.

alf.environments.suite_tic_tac_toe#

class TicTacToeEnvironment(batch_size)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

A Simple 3x3 board game.

For two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal line is the winner.

The reward is +1 if player 0 win, -1 if player 1 win and 0 for draw. An invalid move will give the reward for the opponent.

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_desc()[source]#

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode)[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

load(name='', batch_size=1)[source]#

Load TicTacToeEnvironment

Parameters

name (str) – not used
batch_size (int) – the number of games in the simulation.

alf.environments.suite_unittest#

Environments for unittest.

class ActionType(value)#

Bases: enum.Enum

An enumeration.

Continuous = 2#

Discrete = 1#

class MixedPolicyUnittestEnv(batch_size, episode_length, obs_dim=1)[source]#

Bases: alf.environments.suite_unittest.UnittestEnv

Environment for testing a mixed policy.

Given the agent’s (discrete, continuous) action pair (a_d, a_c), if 'a_d == (a_c > 0.5), the agent receives a reward of 1; otherwise it receives 0.

Initializes the environment.

Parameters

batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode

class PolicyUnittestEnv(batch_size, episode_length, obs_dim=1, action_type=<ActionType.Discrete: 1>, reward_dim=1)[source]#

Bases: alf.environments.suite_unittest.UnittestEnv

Environment for testing policy.

The agent receives 1-diff(action, observation) as reward

Initializes the environment.

Parameters

batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType

class RNNPolicyUnittestEnv(batch_size, episode_length, gap=3, action_type=<ActionType.Discrete: 1>, obs_dim=1)[source]#

Bases: alf.environments.suite_unittest.UnittestEnv

Environment for testing RNN policy.

The agent receives reward 1 after initial gap steps if its actions action match the observation given at the first step.

Initializes the environment.

Parameters

batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType

class UnittestEnv(batch_size, episode_length, obs_dim=1, action_type=<ActionType.Discrete: 1>, reward_dim=1)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Abstract base for unittest environment.

Every episode ends in episode_length steps (including LAST step). The observation is one dimensional. The action is binary {0, 1} when action_type is ActionType.Discrete

and a float value in range (0.0, 1.0) when action_type is ActionType.Continuous

Initializes the environment.

Parameters

batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

class ValueUnittestEnv(batch_size, episode_length, obs_dim=1, action_type=<ActionType.Discrete: 1>, reward_dim=1)[source]#

Bases: alf.environments.suite_unittest.UnittestEnv

Environment for testing value estimation.

Every episode ends in episode_length steps. It always give reward 1 at each step.

Initializes the environment.

Parameters

batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType

alf.environments.thread_environment#

Runs a single environments in a separate thread.

class ThreadEnvironment(env_constructor)[source]#

Bases: alf.environments.alf_environment.AlfEnvironment

Create, Step a single env in a separate thread

Create a ThreadEnvironment

Parameters: env_constructor (Callable) – env_constructor for the OpenAI Gym environment

action_spec()[source]#

Defines the actions that should be provided to step().

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

property batch_size#

The batch size of the environment.

Returns: The batch size of the environment, or 1 if the environment is not batched.
Raises: RuntimeError – If a subclass overrode batched to return True but did not override the batch_size property.

property batched#

Whether the environment is batched or not.

If the environment supports batched observations and actions, then overwrite this property to True.

When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.

Returns: A boolean indicating whether the environment is batched or not.

close()[source]#

Frees any resources used by the environment.

Implement this method for an environment backed by an external process.

This method can be used directly:

env = Env(...)
# Use env.
env.close()

or via a context manager:

with Env(...) as env:
# Use env.

env_info_spec()[source]#: Defines the env_info provided by the environment.

observation_spec()[source]#

Defines the observations provided by the environment.

May use a subclass of TensorSpec that specifies additional properties such as min and max bounds on the values.

Returns: nested TensorSpec

render(mode='rgb_array')[source]#

Renders the environment.

Parameters: mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
Returns: An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
Raises: NotImplementedError – If the environment does not support rendering.

reward_spec()[source]#

Defines the reward provided by the environment.

The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.

Returns: alf.TensorSpec

seed(seed)[source]#

Seeds the environment.

Parameters: seed (int) – Value to use as seed for the environment.

alf.environments.utils#

class UnwrappedEnvChecker[source]#

Bases: object

A class for checking if there is already an unwrapped env in the current process. For some games, if the check is True, then we should stop creating more envs (multiple envs cannot coexist in a process).

See suite_socialbot.py for an example usage of this class.

check()[source]#

check_and_update(wrap_with_process)[source]#: Combine self.check() and self.update()

update(wrap_with_process)[source]#

Update the flag.

Parameters: wrap_with_process (bool) – if False, an env is being created without being wrapped by a subprocess.

create_environment(env_name='CartPole-v0', env_load_fn=<function load>, eval_env_load_fn=None, for_evaluation=False, num_parallel_environments=30, batch_size_per_env=1, nonparallel=False, flatten=True, start_serially=True, num_spare_envs=0, parallel_environment_ctor=<class 'alf.environments.fast_parallel_environment.FastParallelEnvironment'>, seed=None, batched_wrappers=())[source]#

Create a batched environment.

Parameters

env_name (str|list[str]) – env name. If it is a list, MultitaskWrapper will be used to create multi-task environments. Each one of them consists of the environments listed in env_name.
env_load_fn (Callable) – callable that create an environment If env_load_fn has attribute batched and it is True, evn_load_fn(env_name, batch_size=num_parallel_environments) will be used to create the batched environment. Otherwise, a ParallAlfEnvironment will be created.
eval_env_load_fn (Callable) – callable that create an environment for evaluation. If None, use env_load_fn. This argument is useful for cases when the evaluation environment is different from the training environment.
for_evaluation (bool) – whether to create an environment for evaluation (if True) or for training (if False). If True, eval_env_load_fn will be used for creating the environment if provided. Otherwise, env_load_fn will be used.
num_parallel_environments (int) – num of parallel environments
batch_size_per_env (int) – if >1, will create num_parallel_environments/batch_size_per_env ProcessEnvironment. Each of these ProcessEnvironment holds batch_size_per_env environments. If each underlying environment of ProcessEnvironment is itself batched, batch_size_per_env will be used as the batch size for them. Otherwise BatchEnvironmentWrapper will be sused to instruct each process to run the underlying environments sequentially on operations such as step(). The potential benefit of using batch_size_per_env>1 is to reduce the number of processes being used, or to take advantages of the batched nature of the underlying environment.
num_spare_envs (int) – num of spare parallel envs for speed up reset.
nonparallel (bool) – force to create a single env in the current process. Used for correctly exposing game gin confs to tensorboard.
start_serially (bool) – start environments serially or in parallel.
flatten (bool) – whether to use flatten action and time_steps during communication to reduce overhead.
num_spare_envs – number of spare parallel environments to speed up reset. Useful when a reset is much slower than a regular step.
parallel_environment_ctor (Callable) – used to contruct parallel environment. Available constructors are: fast_parallel_environment.FastParallelEnvironment and parallel_environment.ParallelAlfEnvironment.
seed (None|int) – random number seed for environment. A random seed is used if None.
batched_wrappers (Iterable) – a list of wrappers which can wrap batched AlfEnvironment.

Returns

Return type

AlfEnvironment

load_with_random_max_episode_steps(env_name, env_load_fn=<function load>, min_steps=200, max_steps=250)[source]#

Create environment with random max_episode_steps in range [min_steps, max_steps].

Parameters

env_name (str) – env name
env_load_fn (Callable) – callable that create an environment
min_steps (int) – represent min value of the random range
max_steps (int) – represent max value of the random range

Returns

Return type

AlfEnvironment