alf.environments#
alf.environments.alf_environment#
ALF RL Environment API.
- Adapted from TF-Agents Environment API as seen in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/py_environment.py https://github.com/tensorflow/agents/blob/master/tf_agents/environments/tf_environment.py
- class AlfEnvironment[source]#
Bases:
objectAbstract base class for ALF RL environments.
Observations and valid actions are described with
TensorSpec, defined in thespecsmodule.The
current_time_step()method returns currenttime_step, resetting the environment if necessary.The
step(action)method applies the action and returns the newtime_step. This method will also reset the environment if needed and ignore the action in that case.The
reset()method returnstime_stepthat results from an environment reset and is guaranteed to havestep_type=ts.FIRST.The
reset()method is only needed for explicit resets. In general, the environment will reset automatically when needed, for example, when no episode was started or when it reaches a step after the end of the episode (i.e.step_type=ts.LAST).If the environment can run multiple steps at the same time and take a batched set of actions and return a batched set of observations, it should overwrite the property batched to True.
Example for collecting an episode:
env = AlfEnvironment() # reset() creates the initial time_step and resets the environment. time_step = env.reset() while not time_step.is_last(): action_step = policy.action(time_step) time_step = env.step(action_step.action)
- abstract action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- get_info()[source]#
Returns the environment info returned on the last step.
- Returns
Info returned by last call to
step(). None by default.- Raises
NotImplementedError – If the environment does not use info.
- property num_tasks#
Number of tasks supported by this environment.
- abstract observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reset()[source]#
Starts a new sequence and returns the first
TimeStepof this sequence.Note: Subclasses cannot override this directly. Subclasses implement
_reset()which will be called by this method. The output of_reset()will be cached and made available throughcurrent_time_step().- Returns
- Return type
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- seed(seed)[source]#
Seeds the environment.
- Parameters
seed (int) – Value to use as seed for the environment.
- step(action)[source]#
Updates the environment according to the action and returns a
TimeStep.If the environment returned a
TimeStepwithStepType.LASTat the previous step the implementation of_stepin the environment should callresetto start a new sequence and ignoreaction.This method will start a new sequence if called after the environment has been constructed and
resethas not been called. In this caseactionwill be ignored.Note: Subclasses cannot override this directly. Subclasses implement
_step()which will be called by this method. The output of_step()will be cached and made available throughcurrent_time_step().- Parameters
action (nested Tensor) – input actions.
- Returns
- Return type
- property task_names#
The name of each tasks.
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
alf.environments.alf_gym3_wrapper#
Wrapper providing an AlfEnvironment adapter for Gym3 envrionments
Gym3 provides an unified interface for reinforcement leraning environments that improves upon the gym interface and includes vectorization (i.e. natively supported batched environments).
Gym3 has a different set of considerations which lead to different design choices compared to gym. See the following links to learn about those design choices.
https://github.com/openai/gym3/blob/master/docs/design.md
- class AlfGym3Wrapper(gym3_env, image_channel_first=True, ignored_info_keys=[], support_force_reset=False, render_activator=None, frame_extractor=None)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentAn adapter to make Gym3 environments follow Alf’s convention
Although Gym3 provides an official gym wrapper, we decided to not base the Alf adapter upon that gym wrapper because:
Performance and resource-wise, relying the natively supported batch (vectorized) environments from Gym3 is much more memory-efficient than creating a lot of Gym3 instances in subprocesses in batch mode.
Gym3 has a different interface on indicating the last step and first step of an episode compared to gym.
Gym3 has different interfaces to rendering and recording from gym.
Gym3 normally do not provide support for resetting the environment.
In this adapter, all above are considered and patched to achieve compatibility with AlfEnvironment.
Normally you are not expected to call AlfGym3Wrapper directly. Instead the
load()functions for various Gym3-based environments are preferred.For example,
suite_procgen.load()is used to construct procgen environments which themselves are Gym3-based environments.NOTE: TimeLimit is currently not applicable to Gym3 environments as it does not offer reset() interface.
Construct an adapted instance for the input Gym3 environment
- Parameters
gym3_env (
Env) – the input environment which should be an instance of a class that derives from gym3.Envimage_channel_first (
bool) – when set to True, the image-based (of 3 channels) observation will be permuted so that the channel dimension comes first.ignored_info_keys (
List[str]) – a list of keys in the env info that should not be included in the env info of the TimeStep. This is useful when some huge but not useful information are stored in the env info of the underlying Gym3 environment, and ignoring them is crucial to achieve better performance.support_force_reset (
bool) – Gym3 environments do not support force reset in general. However, some of the environments such as procgen allows sending action -1 to reset the environments. Set this to True to enable such behavior.render_activator (
Optional[Callable[[],Env]]) – when set to None, it indicates that this environment does not support rendering. Otherwise it will be a function that re-creates a Gym3 environment with render enabled. See render() for details.frame_extractor (
Optional[Callable[[Env],Any]]) – when set to None, it indicates that this environment does not support recording. Otherwise it will be a function that extracts the rendered frame for recording from the environment.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode)[source]#
Enables rendering by re-activating the environment
- Parameters
mode (
str) – A string indicate the rendering mode. This is to make it compatible with Gym environments’ rendering interface. For AlfGym3Wrapper, it returns the RGB array image if mode is specified as rgb_array, and None for other modes.
alf.environments.alf_gym_wrapper#
Wrapper providing an AlfEnvironment adapter for GYM environments.
- Adapted from TF-Agents Environment API as seen in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/suite_gym.py
- class AlfGymWrapper(gym_env, env_id=None, discount=1.0, auto_reset=True, simplify_box_bounds=True)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentBase wrapper implementing AlfEnvironmentBaseWrapper interface for Gym envs.
Action and observation specs are automatically generated from the action and observation spaces. See base class for
AlfEnvironmentdetails.- Parameters
gym_env (gym.Env) – An instance of OpenAI gym environment.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
auto_reset (bool) – whether or not to reset the environment when done.
simplify_box_bounds (bool) – whether or not to simplify redundant arrays to values for spec bounds.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- property done#
- property gym#
Return the gym environment.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- seed(seed)[source]#
Seeds the environment.
- Parameters
seed (int) – Value to use as seed for the environment.
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
- tensor_spec_from_gym_space(space, simplify_box_bounds=True, float_dtype=<class 'numpy.float32'>)[source]#
Construct tensor spec from gym space.
- Parameters
space (gym.Space) – An instance of OpenAI gym Space.
simplify_box_bounds (bool) – if True, will try to simplify redundant arrays to make logging and debugging less verbose when printed out.
float_dtype (np.float32 | np.float64 | None) – the dtype to be used for the floating numbers. If None, it will use dtypes of gym spaces.
alf.environments.alf_wrappers#
Wrappers for ALF environments.
- Adapted from TF-Agents Environment API as seen in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/wrappers.py
- class ActionObservationWrapper(env)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperAdd prev_action to observation.
The new observation is:
{ 'observation': original_observation, 'prev_action': prev_action }
- Parameters
env (AlfEnvironment) – An AlfEnvironment isinstance to wrap.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
- class AlfEnvironmentBaseWrapper(env)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentAlfEnvironment wrapper forwards calls to the given environment.
Create an ALF environment base wrapper.
- Parameters
env (AlfEnvironment) – An AlfEnvironment instance to wrap.
- Returns
A wrapped AlfEnvironment
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- get_info()[source]#
Returns the environment info returned on the last step.
- Returns
Info returned by last call to
step(). None by default.- Raises
NotImplementedError – If the environment does not use info.
- property num_tasks#
Number of tasks supported by this environment.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- seed(seed)[source]#
Seeds the environment.
- Parameters
seed (int) – Value to use as seed for the environment.
- property task_names#
The name of each tasks.
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
- class AtariTerminalOnLifeLossWrapper(env)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperWrapper to change discount to 0 upon life loss for Atari.
This can potentially make it easier for the learning agent to recognize the signficance of losing a life.
Some papers report the results with this enabled (e.g. arXiv:2111.00210)
- Parameters
env – ALF env to be wrapped
actions_num – number of values to discretize each action dim into
- class BatchEnvironmentWrapper(envs)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentWrapper to make a list of non-batched environment into a batched environment.
Note the individual environments in
envsare executed sequentially doring onestep()ofreset().- Parameters
envs (
List[AlfEnvironment]) – a list of unbatchedAlfEnvironment.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- property metadata#
- property num_tasks#
Number of tasks supported by this environment.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode)[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- seed(seed)[source]#
Seeds the environment.
- Parameters
seed (int) – Value to use as seed for the environment.
- property task_names#
The name of each tasks.
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
- class BatchedTensorWrapper(env)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperWrapper that converts non-batched numpy-based I/O to batched tensors.
Create an ALF environment base wrapper.
- Parameters
env (AlfEnvironment) – An AlfEnvironment instance to wrap.
- Returns
A wrapped AlfEnvironment
- class CurriculumWrapper(env, progress_favor=10.0, current_score_update_rate=0.001, past_score_update_rate=0.0005, warmup_period=100)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperA wrapper to provide automatic curriculum task selection.
The probability of a task being chosen is based on its recent progress in terms of episode reward. A task will be chosen more often if its episode reward increases faster than other tasks.
The progress of a task is defined as the difference between its current score and its past score divided by the average episode length for that task.
env (AlfEnvironment): environment to be wrapped. It needs to be batched. progress_favor (float): how much more likely to choose the environment with the
fastest progress than the ones with no progress. If
progress_favoris 1, all tasks are sampled uniformly.current_score_update_rate (float): the rate for updating the current score past_score_update_rate (float): the rate for updating the past score warmup_period (int): gradually increase
progress_favorfrom 1 toprogress_favorduring the firstnum_tasks * warmup_periodepisodes
- class DiscreteActionWrapper(env, actions_num)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperDiscretize each continuous action dim into several evenly distributed values. Currently only support unnested action spec with a rank-1 shape.
This wrapper can be used in both batch env mode (tensors) and individual env mode (numpy array).
- Parameters
env (
AlfEnvironment) – ALF env to be wrappedactions_num (
int) – number of values to discretize each action dim into
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
- class GoalReplayEnvWrapper(env)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperAdds a goal to the observation, used for HER (Hindsight Experience Replay).
- Sources:
[1] Hindsight Experience Replay. https://arxiv.org/abs/1707.01495.
To use this wrapper, create an environment-specific version by inheriting this class.
Create a wrapper to add a goal to the observation.
- Parameters
env (AlfEnvironment) – An AlfEnvironment isinstance to wrap.
- Raises
ValueError – If environment observation is not a dict
- abstract get_goal_from_trajectory(trajectory)[source]#
Extracts the goal from a given trajectory.
- Parameters
trajectory – An instance of Trajectory.
- Returns
Environment specific goal
- Raises
NotImplementedError – function should be implemented in child class.
- abstract get_trajectory_with_goal(trajectory, goal)[source]#
Generates a new trajectory assuming the given goal was the actual target.
One example is updating a “distance-to-goal” field in the observation. Note that relevant state information must be recovered or re-calculated from the given trajectory.
- Parameters
trajectory – An instance of Trajectory.
goal – Environment specific goal
- Returns
Updated instance of Trajectory
- Raises
NotImplementedError – function should be implemented in child class.
- class MultitaskWrapper(envs, task_names, env_id=None)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentMultitask environment based on a list of environments.
All the environments need to have same observation_spec, action_spec, reward_spec and info_spec. The action_spec of the new environment becomes:
{ 'task_id': TensorSpec((), maximum=num_envs - 1, dtype='int64'), 'action': original_action_spec }
‘task_id’ is used to specify which task to run for the current step. Note that current implementation does not prevent switching task in the middle of one episode.
- Parameters
envs (list[AlfEnvironment]) – a list of environments. Each one represents a different task.
task_names (list[str]) – the names of each task.
env_id (int) – (optional) ID of the environment.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- static load(load_fn, environment_name, env_id=None, **kwargs)[source]#
- Parameters
load_fn (Callable) – function used to construct the environment for each tasks. It will be called as
load_fn(env_name, **kwargs)environment_name (list[str]) – list of environment names
env_id (int) – (optional) ID of the environment.
kwargs – arguments passed to load_fn
- property num_tasks#
Number of tasks supported by this environment.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- seed(seed)[source]#
Seeds the environment.
- Parameters
seed (int) – Value to use as seed for the environment.
- property task_names#
The name of each tasks.
- class NonEpisodicAgent(env, discount=1.0)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperMake the agent non-episodic by replacing all termination time steps with a non-zero discount (essentially the same type as returned by the TimeLimit wrapper).
This wrapper could be useful for pure intrinsic-motivated agent, as suggested in the following paper:
EXPLORATION BY RANDOM NETWORK DISTILLATION, Burda et al. 2019,
“… We argue that this is a natural way to do exploration in simulated environments, since the agent’s intrinsic return should be related to all the novel states that it could find in the future, regardless of whether they all occur in one episode or are spread over several.
… If Alice is modelled as an episodic reinforcement learning agent, then her future return will be exactly zero if she gets a game over, which might make her overly risk averse. The real cost of a game over to Alice is the opportunity cost incurred by having to play through the game from the beginning.”
NOTE: For PURE intrinsic-motivated agents only. If you use both extrinsic and intrinsic rewards, then DO NOT use this wrapper! Because without episodic setting, the agent could exploit extrinsic rewards by intentionally die to get easy early rewards in the game.
- Example usage:
suite_mario.load.env_wrappers=(@NonEpisodicAgent, ) suite_gym.load.env_wrappers=(@NonEpisodicAgent, )
Create a NonEpisodicAgent wrapper.
- Parameters
env (AlfEnvironment) – An AlfEnvironment instance to wrap.
discount (float) – discount of the environment.
- class NormalizedActionWrapper(env)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperNormalize actions into [-1,1].
The reason why we’d like to normalize the actions, even though our action distribution networks can do this, is because we want to set target entropy independent of action ranges for algorithms like SAC.
This wrapper can be used only for individual envs (numpy array) or a batched env (tensor).
- Parameters
env (
AlfEnvironment) – ALF env to be wrapped
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
- class PerformanceProfiler(env, process_profile_fn, process_steps)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperUse cProfile to profile env execution.
Create a PerformanceProfiler that uses cProfile to profile env execution.
- Parameters
env (AlfEnvironment) – An AlfEnvironment instance to wrap.
process_profile_fn (Callable) – A callback that accepts a Profile object. After process_profile_fn is called, profile information is reset.
process_steps (int) – The frequency with which process_profile_fn is called. The counter is incremented each time step is called (not reset); every process_steps steps, process_profile_fn is called and the profiler is reset.
- property duration#
- class RandomFirstEpisodeLength(env, random_length_range, num_episodes=1)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperRandomize the length of the first episode.
The motivation is to make the observations less correlated for the environments that have fixed episode length.
- Example usage:
RandomFirstEpisodeLength.random_length_range=200 suite_gym.load.alf_env_wrappers=(@RandomFirstEpisodeLength, )
Create a RandomFirstEpisodeLength wrapper.
- Parameters
env (AlfEnvironment) – An AlfEnvironment isinstance to wrap.
random_length_range (int) – [1, random_length_range]
num_episodes (int) – randomize the episode length for the first so many episodes.
- class ScalarRewardWrapper(env, reward_weights=None)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperA wrapper that converts a vector reward to a scalar reward by averaging reward dims with a weight vector.
- Parameters
env (AlfEnvironment) – An AlfEnvironment instance to be wrapped.
reward_weights (list[float] | tuple[float]) – a list/tuple of weights for the rewards; if None, then the first dimension will be 1 and the other dimensions will be 0s.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperAdding temporally correlated noise to actions. Reference:
Swamy et al. Causal Imitation Learning under Temporally Correlated Noise, arXiv:2202.01312
Create a Temporally Correlated Noise wrapper, which adds temporally correlated noise to the action before interacting with the environment:
noisy_action = action + past_noise_weight * past_noise + current_noise
- Parameters
sigma (float) – standard deviation of the noise.
past_noise_weight (float) – the weight for the noise from the past when adding into the action for the current time step.
- class TimeLimit(env, duration)[source]#
Bases:
alf.environments.alf_wrappers.AlfEnvironmentBaseWrapperEnd episodes after specified number of steps.
Create a TimeLimit ALF environment.
- Parameters
env (AlfEnvironment) – An AlfEnvironment instance to wrap.
duration (int) – time limit, usually set to be the max_eposode_steps of the environment.
- property duration#
alf.environments.carla_controller#
- class PIDController(K_P, K_I, K_D, dt, integration_time_window=0.5)[source]#
Bases:
objectPID controller.
See https://en.wikipedia.org/wiki/PID_controller for reference
- Parameters
K_P (float) – coefficient for the proportional term
K_I (float) – coefficient for the integral term
K_D (float) – coefficient for the derivative term
dt (float) – time interval in seconds for each step
integration_time_window (float) – the window for the integral in terms of seconds. The integration is implemented as an exponentially weighted sum over the past errors where the weight is decayed by 1 - dt/integration_time_window every step.
- class VehicleController(vehicle, step_time, max_speed=5.56, max_throttle=0.75, max_steering=0.8, max_brake=0.3, s_P=3.6, s_I=0.18, s_D=0, d_P=1.95, d_I=0.07, d_D=0.2)[source]#
Bases:
objectA simple vehicle controller using PID controller.
The defaults are from https://github.com/carla-simulator/carla/blob/master/PythonAPI/carla/agents/navigation/local_planner.py. Note that the max_speed and gain parameters for speed are originally specified for speed in the unit of km/h. Since here we use m/s, we have converted them as follows as our default values:
max_speed = (20 km/h) / 3.6 = 5.56 m/s s_P = (1.0 h/km) * 3.6 = 3.6 s/m s_I = (0.05 h/km) * 3.6 = 0.18 s/m s_D = (0 h/km) * 3.6 = 0 s/m
- Parameters
vehicle (carla.Actor) – the actor for vehicle
step_time (float) – time interval in seconds for each step
max_speed (float) – maximal speed in m/s. Default to 5.6 m/s which is about 20 km/h.
max_throttle (float) – maximal throttle
max_steering (float) – maximal steering
max_brake (float) – maximal brake
s_P (float) – coefficient of the proportional term for the speed controller, with the unit as s/m
s_I (float) – coefficient of the integral term for the speed controller, with the unit as s/m
s_D (float) – coefficient of the derivative term for the speed controller, with the unit as s/m
d_P (float) – coefficient of the proportional term for the direction controller
d_I (float) – coefficient of the integral term for the direction controller
d_D (float) – coefficient of the derivative term for the direction controller
- act(action)[source]#
Generate carla.VehicleControl based on
action- Parameters
action (np.ndarray) – 3-D vector representing action
- Returns
carla.VehicleControl
- action_desc()[source]#
Get the description about the action.
- Returns
the description about the action
- Return type
str
- action_spec()[source]#
Get the action spec.
The action is a 3-D vector of [speed, direction, reverse], where speed is in [-1.0, 1.0] with negative value meaning zero speed and 1.0 corresponding to maximally allowed speed as provided by the
max_speedargument for__init__(), and direction is the relative direction that the vehicle is facing, with 0 being front, -0.5 being left and 0.5 being right, and reverse is interpreted as a boolean value with values greater than 0.5 corrsponding to True to indicate going backward.- Returns
alf.BoundedTensorSpec
alf.environments.carla_sensors#
- class BEVSensor(parent_actor, alf_world, navigation_sensor, image_height_in_pixels=200, image_width_in_pixels=200, pixels_per_meter=5, observation_mode='rgb', pixels_ev_to_bottom=50, history_idx=[- 16, - 11, - 6, - 1], max_history_len=20, vehicle_bbox_factor=1.0, walker_bbox_factor=2.0)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseBEVSensor. Adapted from https://github.com/zhejz/carla-roach/blob/main/carla_gym/core/obs_manager/birdview/chauffeurnet.py
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) – the world object keeping all relevant data and some utility functions (e.g., _get_traffic_light_waypoints)
navigation_sensor (str) – the navigation sensor associated with the parent_actor
image_height_in_pixels (int) – number of pixels for the height of rendered BEV image.
image_width_in_pixels (int) – number of pixels for the width of rendered BEV image.
pixels_per_meter (int) – how many pixels in the BEV image correspond to one meter in the world coordinate
observation_mode (str) –
a string indicating the observation mode for the BEV image. - If “rgb”, the sensor will return encoded rgb image as sensor
readings.
- If “mask”: it will use a multi-channel mask image as the
sensor readings.
- If ‘bitmap’: it will use a multi-channel mask representation,
and encode the mask tensor with bit representation. In this case, a proper decoder might be needed for the bitmap before being used for training.
pixels_ev_to_bottom (int) – the number of pixels of the ego vehicle (ev) to the bottom of the BEV image.
history_idx (list[int]) – a list of numbers representing the indices of the history information to be rendered for non-ego vehicles. For example, we can set history_idx=[-1] for keep only the most recent observation or history_idx=[-11, -1] for both the lastest and also the one 10 steps earlier.
max_history_len (int) – max number of history length preserved
vehicle_bbox_factor (float) – a factor to scale the vehicle bounding boxes
walker_bbox_factor (float) – a factor to scale the walker bounding boxes
- generate_observation_masks()[source]#
Generate all the masks required for rendering the BEV observation.
- Returns
- Return type
a dictionary containing masks for different elements in the scene
- get_current_observation(current_frame)[source]#
Get the current observation.
The observation is an [C, H, W] array with C=3 if self._use_rgb_image is True. Otherwise, it is the a multi-channel mask image including road_mask, route_mask, lane_mask for the first 3 channels, and 3 * len(self._history_idx) channles for vehicle_mask, walker_mask and traffic light mask.
- Parameters
current_frame (int) – not used.
- Returns
BEV image
- observation_desc()[source]#
Get the description about the observation of this sensor.
- Returns
each str corresponds to one TensorSpec from
observatin_spec().- Return type
nested str
- class CameraSensor(parent_actor, sensor_type='sensor.camera.rgb', xyz=(1.6, 0.0, 1.7), pyr=(0.0, 0.0, 0.0), attachment_type='rigid', fov=90.0, fstop=1.4, gamma=2.2, image_size_x=640, image_size_y=480, iso=1200.0)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseCameraSensor.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
sensor_type (str) – ‘sensor.camera.rgb’, ‘sensor.camera.depth’, ‘sensor.camera.semantic_segmentation’
attachment_type (str) – There are two types of attachement. ‘rigid’: the object follow its parent position strictly. ‘spring_arm’: the object expands or retracts depending on camera situation.
xyz (tuple[float]) – the attachment position (x, y, z) relative to the parent_actor.
pyr (tuple[float]) – the attachment rotation (pitch, yaw, roll) in degrees.
fov (str) – horizontal field of view in degrees.
image_size_x (int) – image width in pixels.
image_size_y (int) – image height in pixels.
gamma (float) – target gamma value of the camera.
iso (float) – the camera sensor sensitivity.
- get_current_observation(current_frame)[source]#
- Parameters
current_frame (int) – not used.
- Returns
- The shape is [num_channels, image_size_y, image_size_x],
where num_channels is 3 for rgb sensor, and 1 for other sensors.
- Return type
np.ndarray
- observation_desc()[source]#
Get the description about the observation of this sensor.
- Returns
each str corresponds to one TensorSpec from
observatin_spec().- Return type
nested str
- class CollisionSensor(parent_actor, max_num_collisions=4, include_collision_location=False)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseCollisionSensor for getting collision signal.
It gets the impulses and optionally the locations for the collisions during the last tick.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
max_num_collisions (int) – maximal number of collisions to be included
include_collision_location (bool) – whether to include collision
into the observation. If True, will include the position (location) –
y, z) of the other actor relative to the ego actor ((x,) –
(parent_actor) –
- get_current_observation(current_frame)[source]#
Get the current observation.
- Parameters
current_frame (int) – current frame no. CollisionSensor may not not receive any data in the most recent tick.
current_framewill be compared against the frame no. of the last received data to make sure that the data is correctly interpretted.- Returns
- Impulses from collision during the last tick. Each
impulse is a 3-D vector. At most
max_num_collisionscollisions are used. The result is padded with zeros if there are less thanmax_num_collisionscollisions. Ifinclude_other_actoris True, the observation will have the shape of [max_num_collisions, 2, 3], by stacking the impulses and corresponding collision locations (in ego-coordinate) along dim-1.
- Return type
np.ndarray
- class DynamicObjectSensor(parent_actor, alf_world, history_idx=[- 16, - 11, - 6, - 1], object_filter='vehicle.*', max_object_number=3, with_ego_history=True, view_radius=100)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseDynamicObjectSensor. A sensor that perceives the dynamic objects around the ego agent.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) – the world object keeping all relevant data and some utility functions.
navigation_sensor (str) – the navigation sensor associated with the parent_actor
history_idx (list[int]) – a list of numbers representing the indices of the history information to be rendered for all dynamic objects. For example, we can set history_idx=[-1] for keep only the most recent observation or history_idx=[-11, -1] for both the lastest and also the one 10 steps earlier.
object_filter (str) – a string representing the type of dynamic objects to be perceived, following the blueprint filter format. By default, surrounding dynamic vehicles are perceived.
max_object_number (int) – the maximum number of dynamic objects that can be perceived within one time step, including ego vehicle if
with_ego_historyis True; otherwise, the maximum number of non-ego dynamic objects that can be perfriced in one time step. When the number of dynamic objects is larger thanmax_object_number, those that are far from the ego agent will be excluded from the observation until the condition onmax_object_numberis satisfied.with_ego_history (bool) – whether to include ego history.
view_radius (float) – the radius of the view/perceivable field of the sensor (meter).
- destroy()[source]#
Return the commands for destroying this sensor.
Use
carla.Client.apply_batch_sync()to actually destroy the sensor.- Returns
the commands used to destroy the sensor.
- Return type
list[carla.command]
- get_current_observation(current_frame)[source]#
Get the current observation. :param current_frame: not used. :type current_frame: int
- Returns
the current obsevation tensor.
- observation_desc()[source]#
Get the description about the observation of this sensor.
- Returns
each str corresponds to one TensorSpec from
observatin_spec().- Return type
nested str
- observation_spec()[source]#
Get the observation spec of this sensor.
- Returns
- Return type
nested TensorSpec
- render(x_range=[- 50, 50], y_range=[- 50, 50], img_height=256, img_width=256, dpi=300, figsize=(2, 2), linewidth=4, marker_size=5)[source]#
Return the rendered RGB image of the BEV view of the dynamic objects
- Parameters
x_range (list[float]) – x range for rendering (meter)
x_range – y range for rendering (meter)
img_height (int) – height of the rendered image (pixel)
img_width (int) – width of the rendered image (pixel)
dpi (int) – dpi of the rendered image
figsize (tuple[int]) – figure size used in matplotlib (inches)
linewidth (int) – width of the line representing the trajectories
marker_size (int) – the size if the marker, representing the latest position in the trajectory.
- class GnssSensor(parent_actor)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseGnssSensor for sensing GPS location.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
- get_current_observation(current_frame)[source]#
- Parameters
current_frame (int) – not used
- Returns
- A vector of [latitude (degrees), longitude (degrees),
altitude (meters to be confirmed)]
- Return type
np.ndarray
- class IMUSensor(parent_actor)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseIMUSensor for sensing acceleration and rotation.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
- get_current_observation(current_frame)[source]#
Get the current observation.
- Parameters
current_frame (int) – current frame no. For some sensors, they may not receive any data in the most recent tick.
current_framewill be compared against the frame no. of the last received data to make sure that the data is correctly interpretted. Note that if the sensor receives event in the most recent frame, event.frame should be equal to current_frame - 1.- Returns
sensor data received in the last tick.
- Return type
nested np.ndarray
- class LaneInvasionSensor(parent_actor)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseLaneInvasionSensor for detecting lane invasion.
Lane invasion cannot be directly observed by raw sensors used by real cars. So main purpose of this is to provide training signal (e.g. reward).
TODO: not completed.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
- get_current_observation(current_frame)[source]#
Get the current observation.
- Parameters
current_frame (int) – current frame no. For some sensors, they may not receive any data in the most recent tick.
current_framewill be compared against the frame no. of the last received data to make sure that the data is correctly interpretted. Note that if the sensor receives event in the most recent frame, event.frame should be equal to current_frame - 1.- Returns
sensor data received in the last tick.
- Return type
nested np.ndarray
Bases:
alf.environments.carla_sensors.SensorBaseGenerating future waypoints on the route.
Note that the route is fixed (not change based on current vehicle location).
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) –
Get the current observation.
The observation is an 8x3 array consists of the positions of 8 future locations on the routes.
- Parameters
current_frame (int) – not used.
- Returns
8 3-D positions of future waypoints on the route. Note that the positions are absolution coordinates. However, the
Playerwill transform them to egocentric coordinates as the observation forPlayer- Return type
np.ndarray
Get the current navigation route based on the location.
- Parameters
future_number (int) – the number of future route waypoints. If -1, all the future waypoints on the route will be returned.
- Returns
- contains the 3-D positions of future waypoints on the
route. Note that the positions are absolution coordinates.
- Return type
np.ndarray
Get the index next waypoint.
The next waypoint is the waypoint after the nearest waypoint to the car.
- Returns
index of the next waypoint
- Return type
int
Get the coordinate of waypoint
i.- Parameters
i (int) – waypoint index
- Returns
3-D vector of location
- Return type
numpy.ndarray
The number of waypoints in the route.
Get the description about the observation of this sensor.
- Returns
each str corresponds to one TensorSpec from
observatin_spec().- Return type
nested str
Get the observation spec of this sensor.
- Returns
- Return type
nested TensorSpec
Set the navigation destination.
- Parameters
destination (carla.Location) –
- Returns
The total length of the route in meters, starting from the current vehicle location to the destination.
- class NumpyLaneMarking(color, lane_change, type, width)#
Bases:
tupleCreate new instance of NumpyLaneMarking(color, lane_change, type, width)
- color#
Alias for field number 0
- lane_change#
Alias for field number 1
- type#
Alias for field number 2
- width#
Alias for field number 3
- class NumpyWaypoint(id, location, rotation, road_id, section_id, lane_id, is_junction, lane_width, lane_change, lane_type, right_lane_marking, left_lane_marking)#
Bases:
tupleCreate new instance of NumpyWaypoint(id, location, rotation, road_id, section_id, lane_id, is_junction, lane_width, lane_change, lane_type, right_lane_marking, left_lane_marking)
- id#
Alias for field number 0
- is_junction#
Alias for field number 6
- lane_change#
Alias for field number 8
- lane_id#
Alias for field number 5
- lane_type#
Alias for field number 9
- lane_width#
Alias for field number 7
- left_lane_marking#
Alias for field number 11
- location#
Alias for field number 1
- right_lane_marking#
Alias for field number 10
- road_id#
Alias for field number 3
- rotation#
Alias for field number 2
- section_id#
Alias for field number 4
- class ObstacleDetectionSensor(parent_actor, xyz=(2.0, 0.0, 1.7), pyr=(0.0, 0.0, 0.0), distance=250, hit_radius=1, only_dynamics=False, debug_message=False)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseObstacleDetectionSensor. A sensor that detects the frontal obstacle and use the distance as the observation. It registers an event every time the parent actor has an obstacle ahead. In order to anticipate obstacles, the sensor creates a capsular shape ahead of the parent vehicle and uses it to check for collisions (https://carla.readthedocs.io/en/latest/ref_sensors/#obstacle-detector). This detection technique is also known as sphere tracing
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor.
xyz (tuple[float]) – the attachment position (x, y, z) relative to the parent_actor. This value should be set properly to put the sensor on the windshield of the actor to avoid detection of collision with the actor itself. A default value of (2.0, 0., 1.7) is provided for typical sedan vehicles. For another type of vehicle that is much larger, a larger x value should be used.
pyr (tuple[float]) – the attachment rotation (pitch, yaw, roll) in degrees.
distance (float) – distance within which to be considerred for obstacle detection.
hit_radius (float) – radius of the trace in sphere tracing.
only_dynamics (bool) – If True, the trace will only take for dynamic objects into consideration; otherwise, will also consider static objects.
debug_message (bool) – If True, will log the debug message.
- get_current_observation(current_frame)[source]#
Get the current observation.
- Parameters
current_frame (int) – current frame number.
- Returns
1D vector contains the distance to the frontal obstacle.
- Return type
np.ndarray
- class RadarSensor(parent_actor, xyz=(2.8, 0.0, 1.0), pyr=(5.0, 0.0, 0.0), max_num_detections=200)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseRadarSensor for detecting obstacles.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor.
xyz (tuple[float]) – the attachment position (x, y, z) relative to the parent_actor.
pyr (tuple[float]) – the attachment rotation (pitch, yaw, roll) in degrees.
max_num_detections (int) – maximal number of detection points.
- get_current_observation(current_frame)[source]#
- Parameters
current_frame (int) – current frame no. RadarSensor may not receive any data in the most recent tick.
current_framewill be compared against the frame no. of the last received data to make sure that the data is correctly interpretted.- Returns
- A set of detected points. Each detected point is a 4-D
vector of [vel, altitude, azimuth, depth], where vel is the velocity of the detected object towards the sensor in m/s, altitude is the altitude angle of the detection in radians, azimuth is the azimuth angle of the detection in radians, and depth is the distance from the sensor to the detection in meters.
- Return type
np.ndarray
- class RedlightSensor(parent_actor, player)[source]#
Bases:
alf.environments.carla_sensors.SensorBaseProvide a scalar value representing the distance to the redlight that affects the current
Player.- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
alf_world (World) –
- get_current_observation(red_light_dist)[source]#
Get the current observation.
The a scalar value representing the distance to the redlight.
- Parameters
current_frame (int) – not used.
- Returns
1-D array representing the distance to the redlight that affects the current
Player.- Return type
np.ndarray
- class SensorBase(parent_actor)[source]#
Bases:
abc.ABCBase class for sersors.
- Parameters
parent_actor (carla.Actor) – the parent actor of this sensor
- destroy()[source]#
Return the commands for destroying this sensor.
Use
carla.Client.apply_batch_sync()to actually destroy the sensor.- Returns
the commands used to destroy the sensor.
- Return type
list[carla.command]
- abstract get_current_observation(current_frame)[source]#
Get the current observation.
- Parameters
current_frame (int) – current frame no. For some sensors, they may not receive any data in the most recent tick.
current_framewill be compared against the frame no. of the last received data to make sure that the data is correctly interpretted. Note that if the sensor receives event in the most recent frame, event.frame should be equal to current_frame - 1.- Returns
sensor data received in the last tick.
- Return type
nested np.ndarray
- class World(world, route_resolution=1.0)[source]#
Bases:
objectKeeping data for the world.
- Parameters
world (carla.World) – the carla world instance
route_resolution (float) – the resolution in meters for planned route
- DEFAULT_ENCOUNTERED_RED_LIGHT_DISTANCE = 10000000000.0#
- RED_LIGHT_ENFORCE_DISTANCE = 15#
- get_active_speed_limit(actor, dis_threshold=1.0)[source]#
Get active speed limit for the actor.
- Parameters
actor (carla.Actor) – the vehicle actor
dis_threshold (float) – the distance within which to consider the speed limit sign as active. The one closest to the actor in the active set will be used as the current speed limit. If a negative value is provided, all speed limit signs are taken into considerations for determining the closest one.
- Returns
- the value of the speed limit in m/s is there is a speed limit sign
within the distance of
dis_threshold
None if there is no active speed limit sign
- get_actor_location(aid)[source]#
Get the latest location of the actor.
The reason of using this instead of calling
carla.Actor.get_location()directly is that the location of actors may not have been updated before world.tick().- Parameters
aid (int) – actor id
- Returns
- Return type
carla.Location
- is_running_red_light(actor)[source]#
Whether actor is running red light.
Adapted from RunningRedLightTest.update() in https://github.com/carla-simulator/scenario_runner/blob/master/srunner/scenariomanager/scenarioatomics/atomic_criteria.py
- Parameters
actor (carla.Actor) – the vehicle actor
- Returns
violated red light id if running red light, None otherwise
encountered red light id if encounting one, None otherwise
distance to the encountered red light id if encountering one,
DEFAULT_ENCOUNTERED_RED_LIGHT_DISTANCEotherwise
- property route_resolution#
The sampling resolution of route.
- trace_route(origin, destination)[source]#
Find the route from
origintodestination.- Parameters
origin (carla.Location) –
destination (carla.Location) –
- Returns
list[tuple(carla.Waypoint, RoadOption)]
- get_scaled_image_size(height, width)[source]#
Compute properly scaled image size.
The scaled image height and width are calculated based on the minimum and maximum allowed sizes for rendering, while keeping the aspect ratio of the image unchanged. If both the height and width are within the bound, no scaling is applied.
- Returns
scaled_height (int): scaled image height
scaled_width (int): scaled image width
- Return type
tuple
alf.environments.carla_spectator#
A utility to watch the vehicles in a simulation.
A typical scenario is that you have an on-going Carla training session and you want to see what the training vehicles are doing. You can use this utility to do this:
python carla_spectator --port 2000 --host localhost
If you only have one training session going on, the port is 2000 by default. You can use ps aux | grep Carla to find out –carla-rpc-port and use it to replace 2000.
After carla_spectator starts, you can use TAB key to switch to different vehicles and ESC key to quit the program.
alf.environments.dmc_gym_wrapper#
Wrap dm_control environment with a Gym interface.
Adapted and simplified from https://github.com/denisyarats/dmc2gym
- class DMCGYMWrapper(domain_name, task_name, visualize_reward=True, from_pixels=False, height=84, width=84, camera_id=0, control_timestep=None)[source]#
Bases:
gym.core.EnvA Gym env that wraps a
dm_controlenvironment.- Parameters
domain_name (
str) – the domain name corresponds to the physical robottask_name (
str) – a specific task under a domain, which corresponds to a particular MDP structurevisualize_reward (
bool) – if True, then the rendered frame will have a highlighted color when the agent achieves a reward.from_pixels (
bool) – if True, the observation will be raw pixels; otherwise use the interval state vector as the observation.height (
int) – image observation heightwidth (
int) – image observation widthcamera_id (
int) – which camera to render; a MuJoCo xml file can define multiple cameras with different viewscontrol_timestep (
Optional[float]) – the time duration between two agent actions. If this is greater than the agent’s primitive physics timestep, then multiple physics simulation steps might be performed between two actions. If None, the default control timstep defined by DM control suite will be used.
- property action_space#
- property observation_space#
- render(mode='rgb_array', height=None, width=None, camera_id=0)[source]#
Render an RGB image. Copied from https://github.com/denisyarats/dmc2gym
- reset()[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- seed(seed)[source]#
Sets the seed for this env’s random number generator(s).
Note
Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
- Returns
- Returns the list of seeds used in this env’s random
number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
- Return type
list<bigint>
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
alf.environments.fast_parallel_environment#
- class FastParallelEnvironment(env_constructors, start_serially=True, blocking=False, flatten=True, num_spare_envs_for_reload=0)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentBatch together environments and simulate them in external processes.
The environments are created in external processes by calling the provided callables. This can be an environment class, or a function creating the environment and potentially wrapping it. The environments can be different but must use the same action and observation specs.
Different from
parallel_environment.ParallelAlfEnvironment,FastParallelEnvironmentuses shared memory to transferTimeStepfrom each process environment to the main process.Terminology:
main process: the process where ParallelEnvironment is created
- client process: the process running the actual individual environment created
using env_constructors
Design:
FastParallelEnvironmentuses_penv.ParallelEnvironment(implemented in C++) to coordinate step() and reset(). EachProcessEnvironmentmaintains one_penv.ProcessEnvironmentCallerin the main process and one_penv.ProcessEnvironmentin the client process.In the client process,
_penv.ProcessEnvironment.worker()runs in a loop to wait for jobs from either_penv.ParallelEnvironmentor_penv.ProcessEnvironmentCaller.There are 4 types of job:
- step: step the environment. Sent from
_penv.ParallelEnvironment. The result is communicated back using shared memory.
- step: step the environment. Sent from
- reset: reset the environment. Sent from
_penv.ParallelEnvironment. The result is communicated back using shared memory.
- reset: reset the environment. Sent from
- close: close the environment. Sent from
_penv.ProcessEnvironmentCaller. This will cause the worker to finish and quit the process.
- close: close the environment. Sent from
- call: access other methods of the environment. Sent from
_penv.ProcessEnvironmentCaller. This takes advantage of the pipe mechanism used by the
ParallelAlfEnvironment. This is achieved by callingcall_handlerto do communication using python pipe. The reason of using the original pipe mechanism for other types of communication is that it is not easy to handle communication of unknow size using shared memory.
- call: access other methods of the environment. Sent from
- Parameters
env_constructors (list[Callable]) – a list of callable environment creators.
start_serially (bool) – whether to start environments serially or in parallel.
blocking (bool) – not used. Kept for the same interface as
ParallelAlfEnvironment.flatten (bool) – not used. Kept for the same interface as
ParallelAlfEnvironment.num_spare_envs_for_reload (int) – if positive, these environments will be maintained in a separate queue and be used to handle slow env resets. The batch_size is
len(env_constructors) - num_spare_envs_for_reload
- Raises
ValueError – If the action or observation specs don’t match.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- property envs#
The list of individual environment.
- property metadata#
- property num_spare_envs_for_reload#
- property num_tasks#
Number of tasks supported by this environment.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- property task_names#
The name of each tasks.
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
alf.environments.gym_wrappers#
Wrappers for gym (numpy) environments.
- class BaseObservationWrapper(env, fields=None)[source]#
Bases:
gym.core.ObservationWrapperBase observation Wrapper
BaseObservationWrapper provide basic functions and generic interface for transformation.
The key interface functions are: 1. transform_space(): transform space. 2. transform_observation(): transform observation.
- Parameters
env (gym.Env) – the gym environment
fields (list[str]) – fields to be applied transformation, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is transformed
- class ContinuousActionClip(env, min_v=- 1000000000.0, max_v=1000000000.0)[source]#
Bases:
gym.core.ActionWrapperClip continuous actions according to the action space.
Note that any action outside of the bounds specified by action_space will be clipped to the bounds before passing to the underlying environment.
Create an ContinuousActionClip gym wrapper.
- Parameters
env (gym.Env) – A Gym env instance to wrap
- class ContinuousActionMapping(env, low, high)[source]#
Bases:
gym.core.ActionWrapperMap continuous actions to a desired action space, while keeping discrete actions unchanged.
- Parameters
env (gym.Env) – Gym env to be wrapped
low (float) – the action lower bound to map to.
high (float) – the action higher bound to map to.
- class DMAtariPreprocessing(env, frame_skip=4, noop_max=30, screen_size=84, gray_scale=True)[source]#
Bases:
gym.core.WrapperDerived from tf_agents AtariPreprocessing. Three differences: 1. Random number of NOOPs after reset 2. FIRE after a reset or a lost life. This is for the purpose of evaluation
with greedy prediction without getting stuck in the early training stage.
A lost life doesn’t result in a terminal state
NOTE: Some implementations forces the time step that loses a life to have a zero value (i.e., mark a ‘terminal’ state) to help boostrap value functions, but only resetting the env when all lives are used (`done==True`). In this case, the episodic score is still summed over all lives.
For our implementation, we only mark a terminal state when all lives are used (done==True). It’s more difficult to learn in our case (time horizon is longer).
To see a complete list of atari wrappers used by DeepMind, see https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/atari_wrappers.py Also see OpenAI Gym’s implementation (not completely the same): https://github.com/openai/gym/blob/master/gym/wrappers/atari_preprocessing.py
(This wrapper does not handle framestacking. It can be paired with FrameStack. See atari.gin for an example.)
Constructor for an Atari 2600 preprocessor.
- Parameters
env (gym.Env) – the environment whose observations are preprocessed.
frame_skip (int) – the frequency at which the agent experiences the game.
noop_max (int) – the maximum number of no-op actions after resetting the env
screen_size (int) – size of a resized Atari 2600 frame.
gray_scale (bool) –
- reset()[source]#
Resets the environment. :returns:
- the initial observation emitted by the
environment.
- Return type
observation (np.array)
- step(action)[source]#
Applies the given action in the environment.
Remarks:
- If a terminal state (episode end) is reached, this may
execute fewer than self.frame_skip steps in the environment.
- Furthermore, in this case the returned observation may not contain valid
image data and should be ignored.
- Parameters
action (int) – The action to be executed.
- Returns
the observation following the action. reward (float): the reward following the action. game_over (bool): whether the environment has reached a terminal state.
This is true when an episode is over.
info: Gym API’s info data structure.
- Return type
observation (np.array)
- class EpisodicRandomFrameCrop(env, cropping_fraction=0.8, channel_order='channels_last', share_cropping=True, fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperCreate a frame cropping wrapper that augments the data distribution by randomly crops the image frame according to the specified fraction. Each episode has a randomized cropping location which is consistent over the episode.
- Parameters
env (
Env) – the gym environmentcropping_fraction – the portion of the original image to crop (keep)
channel_order (
str) – The ordering of the dimensions in the input images from the env, should either “channels_last” or “channels_first”.share_cropping (
bool) – if there are multiple image fields, whether they share the same cropping position at each time step. This might be useful if there are multiple images with the same camera intrinsics, e.g., RGB + depth.fields (
Optional[List[str]]) – fields to be cropped. A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is cropped.
- class FrameCrop(env, sx=0, sy=0, width=84, height=84, channel_order='channels_last', fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperCreate a FrameCrop instance
- Parameters
env (gym.Env) – the gym environment
sx (int) – start position along the horizonal direction (x-axis)
sy (int) – start position along the vertical direction (y-axis)
width (int) – crop width
height – crop height
- class FrameFlip(env, ud_flip_prob=0.5, lr_flip_prob=0.5, channel_order='channels_last', fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperCreate a frame flipping wrapper that randomly flips the image fields either vertically or horizontally. For each episode, all fields will have the SAME flipping operation.
The prob for each flipping result:
identical: (1 - udp) * (1 - lrp) ud_flip: udp * (1 - lrp) lr_flip: (1 - udp) * lrp rotate180: udp * lrp
This wrapper is usually used for data augmentation.
- Parameters
env (
Env) – the gym environmentud_flip_prob (
float) – the prob of flipping up-down on the original image.lr_flip_prob (
float) – the prob of flipping left-right, after the testing of up-down flipping.channel_order (
str) – The ordering of the dimensions in the input images from the env, should either “channels_last” or “channels_first”.fields (
Optional[List[str]]) – fields to be cropped. A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is cropped.
- reset(**kargs)[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- class FrameGrayScale(env, fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperGray scale image observation
Create a FrameGrayScale instance
- Parameters
env (gym.Env) – the gym environment
fields (list[str]) – fields to be gray scaled, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is gray scaled
- class FrameResize(env, width=84, height=84, fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperCreate a FrameResize instance
- Parameters
env (gym.Env) – the gym environment
width (int) – resize width
height (int) – resize height
fields (list[str]) – fields to be resized, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is resized
- class FrameSkip(env, skip)[source]#
Bases:
gym.core.Wrapper- Repeat same action n times and return the last observation
and accumulated reward
Create a FrameSkip object
- Parameters
env (gym.Env) – the gym environment
skip (int) – skip skip frames (skip=1 means no skip)
- reset(**kwargs)[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class FrameStack(env, stack_size=4, channel_order='channels_last', fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperStack previous stack_size frames, applied to Gym env.
This is deprecated. Please use
alf.algorithms.data_transformer.FrameStacker, which is more memory-efficient.Create a FrameStack object.
- Parameters
env (gym.Space) – gym environment.
stack_size (int) – stack so many frames
channel_order (str) – The ordering of the dimensions in the input images from the env, should be one of channels_last or channels_first.
fields (list[str]) – fields to be stacked, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is stacked.
- reset()[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- class ImageChannelFirst(env, fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperMake images in observations channel_first.
Args: env (gym.Env): the gym environment fields (list[str]): fields to be applied transformation, A field str is a multi-level
path denoted by “A.B.C”. If None, then non-nested observation is transformed
- class NonEpisodicEnv(env)[source]#
Bases:
gym.core.WrapperMake a gym environment non-episodic by always setting
done=False.- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class NormalizedAction(env)[source]#
Bases:
alf.environments.gym_wrappers.ContinuousActionMappingNormalize actions to
[-1, 1]. This normalized action space is friendly to algorithms that computes action entropy, e.g., SAC.Args: env (gym.Env): Gym env to be wrapped low (float): the action lower bound to map to. high (float): the action higher bound to map to.
- transform_space(observation_space, field, func)[source]#
Transform the child space in observation_space indicated by field using func
- Parameters
observation_space (gym.Space) – space to be transformed
field (str) – field of the space to be transformed, multi-level path denoted by “A.B.C” If None, then non-nested observation_space is transformed
func (Callable) – transform function. The function will be called as func(observation_space, level) and should return new observation_space.
- Returns
transformed space
alf.environments.mario_wrappers#
- class FrameFormat(env, data_format='channels_last')[source]#
Bases:
gym.core.WrapperFormat frame to specified data_format
- Parameters
data_format – Data format for frame channels_first for CHW and channels_last for HWC
- reset()[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class LimitedDiscreteActions(env, all_buttons)[source]#
Bases:
gym.core.ActionWrapperWrap mario environment and make it use discrete actions. Map available button combinations to discrete actions eg:
0 -> None 1 -> UP 2 -> DOWN … k -> A … m -> A + LEFT … n -> B + UP …
- BUTTONS = {'A', 'B'}#
- SHOULDERS = {'L', 'R'}#
- class MarioXReward(env)[source]#
Bases:
gym.core.WrapperWrap mario environment and use X-axis coordinate increment as reward.
if initial or upgrade_to_new_level reward, max_x = 0, 0 else: current_x = xscrollHi * 256 + xscrollLo reward = current_x - max_x if current_x > max_x else 0 max_x = current_x if current_x > max_x else max_x
- reset()[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
alf.environments.parallel_environment#
Runs multiple environments in parallel processes and steps them in batch.
- Adapted from TF-Agents Environment API as seen in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/parallel_py_environment.py
- class ParallelAlfEnvironment(env_constructors, start_serially=True, blocking=False, flatten=True, num_spare_envs_for_reload=0)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentBatch together environments and simulate them in external processes.
The environments are created in external processes by calling the provided callables. This can be an environment class, or a function creating the environment and potentially wrapping it. The environments can be different but must use the same action and observation specs.
The returned environment should not access global variables.
- Parameters
env_constructors (list[Callable]) – a list of callable environment creators.
start_serially (bool) – whether to start environments serially or in parallel.
blocking (bool) – whether to step environments one after another.
flatten (bool) – whether to use flatten action and time_steps during communication to reduce overhead.
num_spare_envs_for_reload (int) – if positive, these environments will be maintained in a separate queue and be used to handle slow env resets. The batch_size is
len(env_constructors) - num_spare_envs_for_reload
- Raises
ValueError – If the action or observation specs don’t match.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- property envs#
The list of individual environment.
- property metadata#
- property num_spare_envs_for_reload#
- property num_tasks#
Number of tasks supported by this environment.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- property task_names#
The name of each tasks.
- time_step_spec()[source]#
Describes the
TimeStepfields returned bystep().Override this method to define an environment that uses non-standard values for any of the items returned by
step(). For example, an environment with tensor-valued rewards.- Returns
A
TimeStepnamedtuple containing (possibly nested)TensorSpecdefining the step_type, reward, discount, observation, prev_action, and end_id.
alf.environments.process_environment#
Step a single env in a separate process for lock free paralellism.
- Adapted from TF-Agents Environment API as seen in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/parallel_py_environment.py
- class ProcessEnvironment(env_constructor, env_id=None, flatten=False, fast=False, num_envs=0, name='')[source]#
Bases:
objectStep environment in a separate process for lock free paralellism.
The environment is created in an external process by calling the provided callable. This can be an environment class, or a function creating the environment and potentially wrapping it. The returned environment should not access global variables.
- Parameters
env_constructor (Callable) – callable environment creator.
env_id (torch.int32) – ID of the the env
flatten (bool) – whether to assume flattened actions and time_steps during communication to avoid overhead.
fast (bool) – whether created by
FastParallelEnvironmentor not.num_envs (int) – number of environments in the
FastParallelEnvironment. Only used iffastis True.name (str) – name of the FastParallelEnvironment. Only used if
fastis True.
- call(name, *args, **kwargs)[source]#
Asynchronously call a method of the external environment.
- Parameters
name (str) – Name of the method to call.
*args – Positional arguments to forward to the method.
**kwargs – Keyword arguments to forward to the method.
- Returns
Promise object that blocks and provides the return value when called.
- render(mode='human')[source]#
Render the environment.
- Parameters
mode (str) – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape [width, height, 3] denoting an RGB image if mode is rgb_array. Otherwise return nothing and render directly to a display window.
- Raises
NotImplementedError – If the environment does not support rendering.
- reset(blocking=True)[source]#
Reset the environment.
- Parameters
blocking (bool) – Whether to wait for the result.
- Returns
New observation when blocking, otherwise callable that returns the new observation.
- start(wait_to_start=True)[source]#
Start the process.
- Parameters
wait_to_start (bool) – Whether the call should wait for an env initialization.
alf.environments.random_alf_environment#
An environment that generates random observations.
- Adapted from TF-Agents Environment API as seen in:
https://github.com/tensorflow/agents/blob/master/tf_agents/environments/random_py_environment.py
- class RandomAlfEnvironment(observation_spec, action_spec, env_id=None, episode_end_probability=0.1, discount=1.0, reward_fn=None, batch_size=None, seed=42, render_size=(2, 2, 3), min_duration=0, max_duration=None, use_tensor_time_step=False)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentRandomly generates observations following the given observation_spec.
If an action_spec is provided it validates that the actions used to step the environment fall within the defined spec.
Initializes the environment.
- Parameters
observation_spec (nested TensorSpec) – tensor spec for observations
action_spec (nested TensorSpec) – tensor spec for actions.
env_id (int) – (optional) ID of the environment.
episode_end_probability (float) – Probability an episode will end when the environment is stepped.
discount (float) – Discount to set in time_steps.
reward_fn (Callable) – Callable that takes in step_type, action, an observation(s), and returns a tensor of rewards.
batch_size (int) – (Optional) Number of observations generated per call. If this value is not None, then all actions are expected to have an additional major axis of size batch_size, and all outputs will have an additional major axis of size batch_size.
seed (int) – Seed to use for rng used in observation generation.
render_size (tuple of ints) – Size of the random render image to return when calling render.
min_duration (int) – Number of steps at the beginning of the episode during which the episode can not terminate.
max_duration (int) – Optional number of steps after which the episode terminates regarless of the termination probability.
use_tensor_time_step (bool) – convert all quantities in time_step to torch.tensor if True. Otherwise use numpy data types.
- Raises
ValueError – If batch_size argument is not None and does not match the
shapes of discount or reward. –
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
alf.environments.suite_babyai#
- class BabyAIWrapper(env, max_instruction_length=80, mode='sent')[source]#
Bases:
gym.core.WrapperA wrapper for BabyAI environment.
BabyAI environment is introduced in Chevalier-Boisver et al. Baby{AI}: First Steps Towards Grounded Language Learning With a Human In the Loop.
It can be downloaded from https://github.com/mila-iqia/babyai
- Parameters
gym_env (gym.Env) – An instance of OpenAI gym environment.
max_instruction_length (int) – the maximum number of words of an instruction.
mode (str) – one of (‘sent’, ‘word’, ‘char’). If ‘sent’, the whole instruction (word ID array) is given in the observation at every step. If ‘word’, the word IDs are given in the observation sequentially. Each step only one word ID is given. A zero is given for every steps after all the word IDs are given. If ‘char’, similar to ‘word’, but only one character is given at each step. For ‘char’ mode, we assume that the unicode of each character is within [0, 127].
- VOCAB = ['then', 'after', 'you', 'and', 'go', 'to', 'pick', 'up', 'open', 'put', 'next', 'door', 'ball', 'box', 'key', 'on', 'your', 'left', 'right', 'in', 'front', 'of', 'you', 'behind', 'red', 'green', 'blue', 'purple', 'yellow', 'grey', 'the', 'a']#
- VOCAB_SIZE = 33#
- reset(**kwargs)[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- load(environment_name, env_id=None, max_instruction_length=80, mode='sent', discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#
Loads the selected environment and wraps it with the specified wrappers.
Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.
- Parameters
environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
max_instruction_length (int) – the maximum number of words of an instruction.
mode (str) – one of (‘sent’, ‘word’, ‘char’). If ‘sent’, the whole instruction (word ID array) is given in the observation at every step. If ‘word’, the word IDs are given in the observation sequentially. Each step only one word ID is given. A zero is given for every steps after all the word IDs are given. If ‘char’, similar to ‘word’, but only one character is given at each step.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to the default step limit defined in the environment’s spec. No limit is applied if set to 0 or if there is no max_episode_steps set in the environment’s spec.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
- Returns
An AlfEnvironment instance.
alf.environments.suite_bsuite#
- class BSuiteWrapper(env)[source]#
Bases:
bsuite.utils.gym_wrapper.GymFromDMEnvA wrapper for Bsuite environment.
The BSuite environment is introduced in Osband et al. Behaviour Suite for Reinforcement Learning.
It can be accessed on https://github.com/deepmind/bsuite
- Parameters
gym_env (gym.Env) – An instance of OpenAI gym environment.
- property observation_space: gym.spaces.box.Box#
- Return type
Box
- reset()[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- load(environment_name='cartpole_swingup/0', env_id=None, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#
Loads the selected environment and wraps it with the specified wrappers.
Note that by default a TimeLimit wrapper is used in wrap_env to limit episode lengths to the default benchmarks defined by the registered environments.
- Parameters
environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to zero as not all bsuite environments specify max episode lengths. No limit is applied if set to 0.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
- Returns
An AlfEnvironment instance.
alf.environments.suite_carla#
CarlaEnvironment suite.
To use this, there are two ways:
Run the code within docker image horizonrobotics/alf:0.0.3-carla Both Docker and Nvidia-Docker2 need to be installed.
Install carla:
wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/CARLA_0.9.9.tar.gz
mkdir carla
tar zxf CARLA_0.9.9.tar.gz -C carla
cd carla/Import
wget https://carla-releases.s3.eu-west-3.amazonaws.com/Linux/AdditionalMaps_0.9.9.tar.gz
cd ..
./ImportAssert.sh
easy_install PythonAPI/carla/dist/carla-0.9.9-py3.7-linux-x86_64.egg
pip install networkx==2.2
Make sure you are using python3.7
- class CarlaEnvironment(batch_size, map_name, vehicle_filter='vehicle.*', walker_filter='walker.pedestrian.*', num_other_vehicles=0, num_walkers=0, percentage_walkers_running=0.1, percentage_walkers_crossing=0.1, global_distance_to_leading_vehicle=2.0, use_hybrid_physics_mode=True, safe=True, day_length=0.0, max_weather_length=0, weather_transition_ratio=0.1, step_time=0.05)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentCarla simulation environment.
In order to use it, you need to either download a valid docker image or a Carla package.
- Parameters
batch_size (int) – the number of learning vehicles.
map_name (str) – the name of the map (e.g. “Town01”)
vehicle_filter (str) – the filter for getting the blueprints for training vehicles. The filter for other vehicles will always be obtained using ‘vehicle.*’.
walker_filter (str) – the filter for getting walker blueprints.
num_other_vehicles (int) – the number of autopilot vehicles
num_walkers (int) – the number of walkers
global_distance_to_leading_vehicle (str) – the autopiloted vehicles will try to keep such distance from other vehicles.
percentage_walkers_running (float) – percent of running walkers
percentage_walkers_crossing (float) – percent of walkers walking across the road.
use_hybrid_physics_mode (bool) – If true, the autopiloted vehicle will not use physics for simulation if it is far from other vehicles.
safe (bool) – avoid spawning vehicles prone to accidents.
day_length (float) – number of seconds of a day. If 0, the time of the day will not change.
max_weather_length (float) – the number of seconds each weather will last at the most. The actual lasting time (actual_weather_length) of each randomized weather setting is randomly sampled from [0.25 * max_weather_length, max_weather_length]. If max_weather_length is set to 0, the weather won’t change. Otherwise, weather randomization is turned on and we will sample a new set of parameters after reaching actual_weather_length for each sampled weather. Note that we exclude
sun_azimuth_angleandsun_altitude_anglefrom weather randomization and they are controlled separately byday_lengthin a more realistic way.weather_transition_ratio (float) – the ratio between the length of the weather transtion part and the actual lasting time of the new weather including the transition phase. It has no effect if max_weather_length is 0.
step_time (float) – how many seconds does each step of simulation represents.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode)[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- reward_spec()[source]#
Defines the reward provided by the environment.
The reward of the most environments is a scalar. So we provide a default implementation which returns a scalar spec.
- Returns
alf.TensorSpec
- vehicles_with_functioning_lights = ['vehicle.audi.tt', 'vehicle.chevrolet.impala', 'vehicle.dodge_charger.police', 'vehicle.audi.etron', 'vehicle.lincoln.mkz2017', 'vehicle.mustang.mustang', 'vehicle.tesla.model3', 'vehicle.volkswagen.t2']#
- class CarlaServer(rpc_port=2000, streaming_port=2001, docker_image='horizonrobotics/alf:0.0.6-carla0.9.9', quality_level='Low', carla_root='/home/carla', use_opengl=True)[source]#
Bases:
objectCarlaServer for doing the simulation.
- Parameters
rpc_port (int) – port for RPC
streaming_port (int) – port for data streaming
docker_image (str) – If provided, will use the docker image to start the Carla server. Some valid images are “carlasim/carla:0.9.9” and “horionrobotics/alf:0.0.3-carla”
quality_level (str) – one of [‘Low’, ‘Epic’]. See the explanation at https://carla.readthedocs.io/en/latest/adv_rendering_options/#graphics-quality
carla_root (str) – directorcy where CarlaUE4.sh is in. The default value is correct for using docker image. If not using docker image, make sure you provide the correct path. This is the directory where you unzipped the file you downloaded from https://github.com/carla-simulator/carla/releases/tag/0.9.9.
use_opengl (bool) – the default graphics engine of Carla is Vulkan, which is supposed to be better than OpenGL. However, Vulkan is not always available. It may not be installed or the nvidia driver does not support vulkan.
- class Player(actor, alf_world, controller_ctor=None, success_reward=100.0, success_distance_thresh=5.0, max_collision_penalty=20.0, max_stuck_at_collision_seconds=5.0, stuck_at_collision_distance=1.0, max_red_light_penalty=10.0, overspeed_penalty_weight=0.0, sparse_reward=False, sparse_reward_interval=10.0, allow_negative_distance_reward=True, min_speed=5.0, additional_time=0.0, with_gnss_sensor=True, with_imu_sensor=True, with_camera_sensor=True, with_radar_sensor=True, with_bev_sensor=False, with_dynamic_object_sensor=False, data_collection_mode=False, with_red_light_sensor=False, with_obstacle_sensor=False, terminate_upon_infraction='', render_waypoints=True)[source]#
Bases:
objectPlayer is a vehicle with some sensors.
An episode terminates if it reaches one of the following situations: 1. the vehicle arrives at the goal. 2. the time exceeds
route_length / min_speed + additional_time. 3. it get stuck because of a collision.At each step, the reward is given based on the following components: 1. Arriving goal:
success_reward2. Moving in the navigation direction: the number of meters movedThis moving reward can be either dense of sparse depending on the argument
sparse_reward.Negative reward caused by collision:
-min(max_collision_reward, max(epside_reward, 0))
Currently, the player has these sensors:
CollisionSensor,GnssSensor,IMUSensor,CameraSensor,BEV_sensor,LaneInvasionSensor,RadarSensor,NavigationSensor. See the documentation for these class for the definition the data generated by these sensors.- Parameters
actor (carla.Actor) – the carla actor object
alf_world (Wolrd) – the world containing the player
controller_ctor (Callable|None) – if provided, will be as
controller_ctor(vehicle, step_time)to create a vehicle controller. It will be used to process the action and generate the control.success_reward (float) – the reward for arriving the goal location.
success_distance_thresh (float) – success is achieved if the current location is with such distance of the goal
max_collision_penalty (float) – the maximum penalty (i.e. negative reward) for collision. We don’t want the collision penalty to be too large if the player cannot even get enough positive moving reward. So the penalty is capped at
Player.PENALTY_RATE_COLLISION * max(0., episode_reward)). Note that this reward is only given once at the first step of contiguous collisions.max_stuck_at_collision_seconds (float) – the episode will end and is considerred as failure if the car is stuck at the collision for so many seconds,
stuck_at_collision_distance (float) – the car is considerred as being stuck at the collision if it is within such distance of the first collision location.
max_red_light_penalty (float) – the maximum penalty (i.e. negative reward) for red light violation. We don’t want the red light penalty to be too large if the player cannot even get enough positive moving reward. So the penalty is capped at
Player.PENALTY_RATE_RED_LIGHT * max(0., episode_reward)). Note that this reward is only given once at the first step of contiguous red light violation.overspeed_penalty_weight (float) – if > 0, a penalty proportional to the overspeed magnitude will be applied, multiplied by the step time (seconds each step of simulation represents) to make the penalty invariant to it, and then multiplied by the weight of
overspeed_penalty_weight. A negative value is the same as 0.sparse_reward (bool) – If False, the distance reward is given at every step based on how much it moves along the navigation route. If True, the distance reward is only given after moving
sparse_reward_distance.sparse_reward_interval (float) – the sparse reward is given after approximately every such distance along the route has been driven.
allow_negative_distance_reward (True) – whether to allow negative distance reward. If True, the agent will receive positive reward for moving ahead along the route, and negative reward for moving back along the route. If False, the agent still receives positive reward for moving ahead along the route, but will not receive negative reward for moving back along the route. Instead, the negative distance will be accumulated to the future distance reward. This may ease the learning if the right behavior is to temporarily go back along the route in order, for examle, to avoid obstacle.
min_speed (float) – unit is m/s. Failure if route_length / min_speed + additional_time seconds passed
additional_time (float) – additional time (unit is second) provided to the agent in each episode. This is useful especially for the episodes with short route_lengths (e.g. < 50m), as it takes some time for the car to be able to move (because of initial spawning phase with z > 0 and acceleration phase).
with_gnss_sensor (bool) – whether to use
GnssSensor.with_imu_sensor (bool) – whether to use
IMUSensor.with_camera_sensor (bool) – whether to use
CameraSensor.with_radar_sensor (bool) – whether to use
RadarSensor.with_bev_sensor (bool) – whether to use
BEVSensor.data_collection_mode (bool) – if True, will use Rule-based agents to control the Players. This can be used for purposes such as collecting data.
with_red_light_sensor (bool) – whether to use
RedlightSensor.with_obstacle_sensor (bool) – whether to use
ObstacleDetectionSensor.terminate_upon_infraction (str) – whether to terminate the episode based on the specified mode (“collision”, “redlight”, “all”, “”), when the agent has the corresponding infractions. If “”, no infraction-based termination is activated.
render_waypoints (bool) – whether to render (interpolated) waypoints in the generated video during rendering. Note that it is only used for visualization and has no impacts on the perception data.
- PENALTY_RATE_COLLISION = 0.5#
- PENALTY_RATE_RED_LIGHT = 0.3#
- REWARD_COLLISION = 2#
- REWARD_DIMENSION = 6#
- REWARD_DISTANCE = 1#
- REWARD_OVERALL = 0#
- REWARD_OVERSPEED = 5#
- REWARD_RED_LIGHT = 4#
- REWARD_SUCCESS = 3#
- act(action)[source]#
Generate the carla command for taking the given action.
Use
carla.Client.apply_batch_sync()to actually destroy the sensor.- Parameters
action (nested np.ndarray) –
- Returns
- Return type
list[carla.command]
- action_desc()[source]#
Get the description about the action.
- Returns
each str corresponds to one TensorSpec from
action_spec().- Return type
nested str
- action_spec()[source]#
Get the action spec.
If
controlleris provided at__init__(), the action_spec is given bycontroller.Otherwise, the action is a 4-D vector of [throttle, steer, brake, reverse], where throttle is in [-1.0, 1.0] (negative value is same as zero), steer is in [-1.0, 1.0], brake is in [-1.0, 1.0] (negative value is same as zero), and reverse is interpreted as a boolean value with values greater than 0.5 corrsponding to True.
- Returns
- Return type
nested BoundedTensorSpec
- destroy()[source]#
Get the commands for destroying the player.
Use carla.Client.apply_batch_sync() to actually destroy the sensor.
- Returns
- Return type
list[carla.command]
- get_current_time_step(current_frame)[source]#
Get the current time step for the player.
- Parameters
current_frame (int) – current simulation frame no.
- Returns
all elements are
np.ndarrayornp.number.- Return type
- get_overspeed_amount()[source]#
Get the difference between the actor’s speed and the speed limit, lower bounded by 0. :returns:
if actor’s
_speed_limitis None or speed is lower than
speed limit
the amount of the actor’s speed over the speed limit otherwise
- Return type
float
- observation_desc()[source]#
Get the description about the observation.
- Returns
each str corresponds to one TensorSpec from
observatin_spec().- Return type
nested str
- render(mode)[source]#
Render the simulation.
- Parameters
mode (str) – one of [‘rgb_array’, ‘human’]
- Returns
None: if mode is ‘human’
- np.ndarray: the image of shape [height, width, channeles] if
mode is ‘rgb_array’
- Return type
one of the following
- reset()[source]#
Reset the player location and goal.
Use
carla.Client.apply_batch_sync()to actually reset.- Returns
- Return type
list[carla.command]
- update_speed_limit(dis_threshold=10)[source]#
Update the speed limit of the actor according to the active speed limit sign. The speed limit is updated when passing by a speed limit sign.
- Parameters
dis_threshold (float) – the distance in meter within which to consider the speed limit sign as active. The one closest to the actor in the active set will be used as the current speed limit. If a negative value is provided, all speed limit signs are taken into considerations for determining the closest one.
- Returns
speed limit in m/s
- Return type
float
- class WeatherParameters(cloudiness=0, precipitation=0, precipitation_deposits=0, wind_intensity=0, fog_density=0, fog_distance=0)[source]#
Bases:
objectA class for a set of weather related parameters. Currently it contains all the weather fields from
carla.WeatherParametersexcept forsun_azimuth_angleandsun_altitude_angle, which are controlled separately byday_lengthin a more realistic way.
- adjust_weather_parameters(weather_param, delta)[source]#
Adjust the parameters of
weather_paramaccording to the fields inWeatherParameters. The value is adjusted by adding the field value ofdeltatoweather_param.- Parameters
weather_param (carla.WeatherParameters) – a
carla.WeatherParametersinstance containing the parameters to be adjusteddelta (WeatherParameters) – an instance of
WeatherParameterswith the value of each field representing the amount to be adjusted
- Returns
The input weather_param instance with adjusted field values.
- extract_weather_parameters(weather_param)[source]#
Extract the parameters according to the fields in
WeatherParametersand use them to construct an instance ofWeatherParameters.
- load(map_name, batch_size, wrappers=[])[source]#
Load CarlaEnvironment
- Parameters
map_name (str) – name of the map. Currently available maps are: ‘Town01, Town02’, ‘Town03’, ‘Town04’, ‘Town05’, ‘Town06’, ‘Town07’, and ‘Town10HD’
batch_size (int) – the number of vehicles in the simulation.
wrappers (list[AlfEnvironmentBaseWrapper]) – environment wrappers
- Returns
CarlaEnvironment
alf.environments.suite_dmc#
- load(environment_name='cheetah:run', from_pixels=True, image_size=100, env_id=None, discount=1.0, visualize_reward=False, max_episode_steps=1000, control_timestep=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#
Load a MuJoCo environment.
For installation of DMControl, see https://github.com/deepmind/dm_control. For installation of MuJoCo210, see https://mujoco.org.
- Parameters
environment_name (str) – this string must have the format “domain_name:task_name”, where “domain_name” is defined by DM control as the physical model name, and “task_name” is an instance of the model with a parcular MDP structure.
from_pixels (boolean) – Output image if set to True.
image_size (int) – The height and width of the output image from the environment.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
visualize_reward – if True, then the rendered frame will have a highlighted color when the agent achieves a reward.
max_episode_steps (int) – The maximum episode step in the environment.
control_timestep (float) – the time duration between two agent actions. If this is greater than the agent’s primitive physics timestep, then multiple physics simulation steps might be performed between two actions. The difference between multi-physics steps and “action repeats”/FrameSkip is that the intermediate physics step won’t need to render an observation (which might save time if rendering is costly). However, this also means that unlike “action repeats”/FrameSkip which accumulates rewards of several repeated steps, only a single-step reward is obtained after all the physics simulation steps are done. The total number of physics simulation steps in an episode is
control_timestep / physics_timestep * frame_skip * max_episode_steps. If None, the default control timstep defined by DM control suite will be used.gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment. There will be an AlfEnvironmentDMC2GYMWrapper added before any alf_wrappers.
- Returns
A wrapped AlfEnvironment
alf.environments.suite_dmlab#
- class DeepmindLabEnv(scene, action_repeat=4, observation='RGB_INTERLEAVED', config={}, renderer='hardware')[source]#
Bases:
gym.core.EnvCreate an deepmind_lab env
- Parameters
scene (str) – script for the deepmind_lab env. See available script: https://github.com/deepmind/lab/tree/master/game_scripts/levels
action_repeat (int) – the interval at which the agent experiences the game
observation (str) – observation format. See doc about the available observations: https://github.com/deepmind/lab/blob/master/docs/users/python_api.md
config (dict) – config for env
renderer (str) – ‘software’ or ‘hardware’. If set to ‘hardware’, EGL or GLX is used for rendering. Make sure you have GPU if you use ‘hardware’.
- close()[source]#
Override close in your subclass to perform any necessary cleanup.
Environments will automatically close() themselves when garbage collected or when the program exits.
- metadata = {'render.modes': ['rgb_array']}#
- render(mode='rgb_array', close=False)[source]#
Renders the environment.
The set of supported modes varies per environment. (And some environments do not support rendering at all.) By convention, if mode is:
human: render to the current display or terminal and return nothing. Usually for human consumption.
rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.
ansi: Return a string (str) or StringIO.StringIO containing a terminal-style text representation. The text can include newlines and ANSI escape sequences (e.g. for colors).
Note
- Make sure that your class’s metadata ‘render.modes’ key includes
the list of supported modes. It’s recommended to call super() in implementations to use the functionality of this method.
- Parameters
mode (str) – the mode to render with
Example:
- class MyEnv(Env):
metadata = {‘render.modes’: [‘human’, ‘rgb_array’]}
- def render(self, mode=’human’):
- if mode == ‘rgb_array’:
return np.array(…) # return RGB frame suitable for video
- elif mode == ‘human’:
… # pop up a window and render
- else:
super(MyEnv, self).render(mode=mode) # just raise an exception
- reset()[source]#
Resets the state of the environment and returns an initial observation.
- Returns
the initial observation.
- Return type
observation (object)
- seed(seed=None)[source]#
Sets the seed for this env’s random number generator(s).
Note
Some environments use multiple pseudorandom number generators. We want to capture all such seeds used in order to ensure that there aren’t accidental correlations between multiple generators.
- Returns
- Returns the list of seeds used in this env’s random
number generators. The first value in the list should be the “main” seed, or the value which a reproducer should pass to ‘seed’. Often, the main seed equals the provided ‘seed’, but this won’t be true if seed=None, for example.
- Return type
list<bigint>
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- action_discretize(action_spec, look_left_right_pixels_per_frame=(- 20, 20), look_down_up_pixels_per_frame=(- 10, 10), strafe_left_right=(- 1, 1), move_back_forward=(- 1, 1), fire=(), jump=(1), crouch=(1), **kwargs)[source]#
Discretize action from action_spec
TODO: action combinations
Mapping all valid action values to discrete action
original deepmind lab environment action_spec:
[{'max': 512, 'min': -512, 'name': 'LOOK_LEFT_RIGHT_PIXELS_PER_FRAME'}, {'max': 512, 'min': -512, 'name': 'LOOK_DOWN_UP_PIXELS_PER_FRAME'}, {'max': 1, 'min': -1, 'name': 'STRAFE_LEFT_RIGHT'}, {'max': 1, 'min': -1, 'name': 'MOVE_BACK_FORWARD'}, {'max': 1, 'min': 0, 'name': 'FIRE'}, {'max': 1, 'min': 0, 'name': 'JUMP'}, {'max': 1, 'min': 0, 'name': 'CROUCH'}]
and discretized actions:
0 -> [20,0,0,0,0,0,0] (look left 20 pixels), 1 -> [-20,0,0,0,0,0,0] (look right 20 pixels), ..., m -> [0,0,0,-1,0,0,0] (move back), m+1-> [0,0,0,1,0,0,0] (move forward) , ..., n -> [0,0,0,0,1,1,0] (jump and fire), ...
see SuiteDMLabTest.test_action_discretize in suite_dmlab_test.py for examples
- Parameters
action_spec (list(dict)) – action spec
look_left_right_pixels_per_frame (iterable|str) – look left or look right pixels
look_down_up_pixels_per_frame (iterable|str) – look down or look up pixels
strafe_left_right (iterable|str) – strafe left or strafe right
move_back_forward (iterable|str) – move back or move forward
fire (iterable|str) – fire values
jump (iterable|str) – jump values
crouch (iterable|str) – crouch values
kwargs (dict) – other config for actions
- Returns
discrete actions
- Return type
actions (list[numpy.array])
- load(scene, env_id=None, discount=1.0, frame_skip=4, gym_env_wrappers=(), alf_env_wrappers=(), wrap_with_process=False, max_episode_steps=None)[source]#
Load deepmind lab envs. :param scene: script for the deepmind_lab env. See available script:
- Parameters
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
frame_skip (int) – the frequency at which the agent experiences the game
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
wrap_with_process (bool) – Whether wrap env in a process
max_episode_steps (int) – max episode step limit
- Returns
An AlfEnvironment instance.
alf.environments.suite_go#
GoEnvironment.
- class GoBoard(batch_size, height, width, max_num_moves, num_previous_boards=10)[source]#
Bases:
objectThis implements Go board.
This class only takes care how the board changes when a valid move is given. Other go rules are handled by GoEnvironment
We maintain the following data and incrementally update them:
- _board: the current board of shape [B, H, W]. At each position, 0 means
it is empty, -1 means a stone of player 0, 1 means a stone of player 1. The board is padded with 2 on four sides to make the handling of boundary simpler.
- _cc_id: the connected component (CC) which each position belongs to. The
shape is [B, H, W].
_cc_qi: the qi (liberty) of each CC. The shape is [B, max_num_ccs].
_num_ccs: the number of CCs.
Note that the qi is different from the common definition of qi. For example. in the following board, the qi of the connected component “o” is 4 in our data structure because position (1, 1) is counted adjacent to (1, 0) and (0, 1) and is counted twice towards the qi of “o”. While using the common definition of qi, the liberty of “o” is 3. We use this different way of calculating qi so that code can be simplified.
0123 ------ 0|oo | 1|o | 2| | 3| | ------
- Parameters
batch_size (int) – the number of parallel boards
height (int) – height of each board
width (int) – width of each board
max_num_moves (int) – maximum number of moves allowed
num_previous_boards (int) – previous so many board situation will be stored. They will be used by
classify_all_moves()to check whether a move will lead to board situation same as one of these previous board situations.
- calc_area(board_indices=None)[source]#
Calculate the area of each player.
In order for a position to be considered to be owned by a player, it has to be either the player’s stone or cannot be reached by the opponent’s stones. With this definition of area, players have to play until all the dead stones have been taken out. This shouldn’t change how the game is played. This is the so called Tromp-Taylor rules
- Parameters
board_indices (Tensor) – int64 Tensor to indicate the boards
- Returns
area for player 0 and player 1
- Return type
tuple (Tensor, Tensor)
- calc_area_simple(board_indices=None)[source]#
Calculate the area of each player.
In order for a position to be considered to be owned by a player, it has to be either the player’s stone or fully surrounded by the player’s stone. With this definition of area, players have to play until the board is full except the eyes of only one position. This shouldn’t change how the game is played.
- Parameters
board_indices (Tensor) – int64 Tensor to indicate the boards
- Returns
area for player 0 and player 1
- Return type
tuple (Tensor, Tensor)
- classify_all_moves(player, board_indices=None)[source]#
Classify all the moves on the board.
This function will examine all possible moves except PASS and annotate them using 3 boolean attributes: occupied, suicidal, and repeated.
- Parameters
player (Tensor) – int8 Tensor to indicate which player to consider.
board_indices (Tensor) – int64 Tensor to indicate the boards
- Returns
each one is a bool Tensor of shape [B, height, width]. - occupied: occupied[b, y, x] means whether a move at (y, x) overlapped
with existing stone on the board[b]
- suicidal: suicidal[b, y, x] means whether a move at (y, x) is a
suicidal move for player[b] on board[b]
- repeated: repeated[b, y, x] means whether a move at (y, x) by player[b]
will result in a board same as one of the previous boards of board[b].
- Return type
tuple
- get_board(board_indices=None)[source]#
Get the current board.
- Parameters
board_indices (Tensor) – int64 Tensor to indicate the boards
- Returns
int8 Tensor of the shape [B, height, width].
- Return type
Tensor
- reset_board(board_indices=None)[source]#
Reset the board to initial condition.
- Parameters
board_indices (Tensor) – int64 Tensor to indicate the boards
- update(board_indices, y, x, player)[source]#
Update the board for given move at (y, x).
It assumes the move is at an empty location.
- Parameters
board_indices (Tensor) – int64 Tensor to indicate which boards to update.
y (Tensor) – int64 Tensor of the same shape as
board_indicesto indicate the y coordinate of the movex (Tensor) – int64 Tensor of the same shape as
board_indicesto indicate the x coordinate of the moveplayer (Tensor) – int8 Tensor of the same shape as
board_indicesto indicate which player make the move
- Returns
- bool Tensor with the same size as
board_indices. It indicates whether the move for each board is suicidal (i.e., making the qi of the player 0). Note that suicidal move may change the board because all the stones of the player which are connected to the suicidal move will be removed.
- bool Tensor with the same size as
- Return type
Tensor
- class GoEnvironment(batch_size, height=19, width=19, winning_thresh=7.5, allow_suicidal_move=False, reward_shaping=False, human_player=None)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentGo environment.
The game plays until one of the following events happen:
Both player pass. In this case, the area of each player will be calculated
and the reward is 1 if player 0 win, -1 if player 1 win. When calculating the area, in order for a position to be considered to be owned by a player, it has to be either the player’s stone or cannot be reached by the opponent’s stones. With this definition of area, players have to play until all the dead stones have been taken out. This shouldn’t change how the game is played. This is the so called Tromp-Taylor rules
An invalid move. The opponent will get reward, which means that if player 0 make an invalid move, the reward is -1. If player 1 make an invalid move, the reward is 1. There are two types of invalid moves: a. a move to position which is already occupied. b. a move which leads to a board exactly same as the previous board.
The total number of moves exceeds
max_num_moves. This is considered as both passing.max_num_movesis set to2 * height * width.
The observation is an
OrderedDictcontaining three fields:board: a [batch_size, 1, height, width] int8 Tensor, with 0 indicating empty location, -1 indicating a stone of player 0 and 1 indicating a stone of player 1
to_play: a [batch_size] int8 Tensor indicating who is going to make the next move. Its value is either 0 or 1
prev_action: a [batch_size] int64 Tensor indicating the action taken by the previous player. This is pass action for the first step.
The action is an int64 scalar. If it is smaller than
height*width, it means to play the stone at (action // width, action % width). If it is equal toheight * width, it means to pass for this round.- Parameters
batch_size (int) – the number of parallel boards
height (int) – height of each board
width (int) – width of each board
winning_thresh (float) – player 0 wins if area0 - area1 > winning_thresh, lose if area0 - area1 < winning_thresh, otherwise draw.
allow_suicidal_move (bool) – whether suicidal move is allowed.
reward_shaping (bool) – if True, instead of using +1,-1 as reward, use
alf.math.softsign(area0 - area1 - winning_thresh)as reward to encourage capture more area.human_player (int|None) – 0, 1 or None
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- metadata = {'render.modes': ['human', 'rgb_array'], 'video.frames_per_second': 1}#
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode)[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
alf.environments.suite_gym#
- load(environment_name, env_id=None, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=(), image_channel_first=True)[source]#
Loads the selected environment and wraps it with the specified wrappers.
Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.
- Parameters
environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to the default step limit defined in the environment’s spec. No limit is applied if set to 0 or if there is no max_episode_steps set in the environment’s spec.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
image_channel_first (bool) – whether transpose image channels to first dimension.
- Returns
An AlfEnvironment instance.
- wrap_env(gym_env, env_id=None, discount=1.0, max_episode_steps=0, gym_env_wrappers=(), time_limit_wrapper=<class 'alf.environments.alf_wrappers.TimeLimit'>, normalize_action=True, clip_action=True, alf_env_wrappers=(), image_channel_first=True, auto_reset=True)[source]#
Wraps given gym environment with AlfGymWrapper.
Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.
Also note that all gym wrappers assume images are ‘channel_last’ by default, while PyTorch only supports ‘channel_first’ image inputs. To enable this transpose, ‘image_channel_first’ is set as True by default.
gym_wrappers.ImageChannelFirstis applied after all gym_env_wrappers and before the AlfGymWrapper.- Parameters
gym_env (gym.Env) – An instance of OpenAI gym environment.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – Used to create a TimeLimitWrapper. No limit is applied if set to 0. Usually set to gym_spec.max_episode_steps as done in load. Note that a ``TimeLimit` wrapper will be applied as the last Gym wrapper, so if you also use the
FrameSkipGym wrapper, then the actual max length of an episode will beskip*max_episode_steps.gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
time_limit_wrapper (AlfEnvironmentBaseWrapper) – Wrapper that accepts (env, max_episode_steps) params to enforce a TimeLimit. Usually this should be left as the default, alf_wrappers.TimeLimit.
normalize_action (bool) – if True, will scale continuous actions to
[-1, 1]to be better used by algorithms that compute entropies.clip_action (bool) – If True, will clip continuous action to its bound specified by
action_spec. Ifnormalize_actionis alsoTrue, this clipping happens after the normalization (i.e., clips to[-1, 1]).alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
image_channel_first (bool) – whether transpose image channels to first dimension. PyTorch only supports channgel_first image inputs.
auto_reset (bool) – If True (default), reset the environment automatically after a terminal state is reached.
- Returns
An AlfEnvironment instance.
alf.environments.suite_highway#
Suite for loading highway environments. Installation: pip install git+https://github.com/eleurent/highway-env
- class ActionScalarization(env)[source]#
Bases:
gym.core.WrapperConvert action to scalar if the current action space is MetaDiscreteAction and type of the input action is
np.ndarray- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- class FlattenObservation(env, fields=None)[source]#
Bases:
alf.environments.gym_wrappers.BaseObservationWrapperFlatten the 2D observations into a 1D vector
- Parameters
env (gym.Env) – the gym environment
fields (list[str]) – fields to be applied transformation, A field str is a multi-level path denoted by “A.B.C”. If None, then non-nested observation is transformed
- class RemoveActionEnvInfo(env)[source]#
Bases:
gym.core.WrapperRemove action from EnvInfo if exist
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)
- load(environment_name, env_id=None, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=(), env_config=None)[source]#
Loads the selected environment and wraps it with the specified wrappers.
Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.
- Parameters
environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None or 0 the
max_episode_stepswill be set to the default step limit defined in the environment. Otherwisemax_episode_stepswill be set to the smaller value of the two.gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
env_config (dict|None) – a dictionary for configuring some aspects of the environment. If is None, the default configuration will be used. Please refer to the
default_env_configbelow for an example config and the doc for more details: https://highway-env.readthedocs.io/en/latest/user_guide.html
- Returns
An AlfEnvironment instance.
alf.environments.suite_mario#
- load(game, env_id=None, state=None, discount=1.0, wrap_with_process=False, frame_skip=4, record=False, crop=True, gym_env_wrappers=(), alf_env_wrappers=(), max_episode_steps=4500)[source]#
Loads the selected mario game and wraps it . :param game: Name for the environment to load. :type game: str :param env_id: (optional) ID of the environment. :type env_id: int :param state: game state (level) :type state: str :param wrap_with_process: Whether wrap env in a process :type wrap_with_process: bool :param discount: Discount to use for the environment. :type discount: float :param frame_skip: the frequency at which the agent experiences the game :type frame_skip: int :param record: Record the gameplay , see retro.retro_env.RetroEnv.record
False for not record otherwise record to current working directory or specified director
- Parameters
crop (bool) – whether to crop frame to fixed size
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
max_episode_steps (int) – max episode step limit
- Returns
An AlfEnvironment instance.
alf.environments.suite_metadrive#
- class AlfMetaDriveWrapper(metadrive_env, env_id=0)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentWrapper over the MetaDrive autonomous driving environment. You will need to have metadrive installed as a dependency to use this.
Constructor of AlfMetaDriveWrapper. :type metadrive_env:
MetaDriveEnv:param metadrive_env: the original meta drive environment being wrapped.The meta drive environment should be properly configured on its own before being wrapped.
- Parameters
env_id (
int) – the ID of this environment when appear as part of a batched environment.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode)[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
- seed(seed=None)[source]#
Reset the underlying MetaDrive environment with a specified seed.
MetaDrive uses a slightly different mechanism for seeds. Upon construction of a MetaDrive environment, the user needs to specify a seed range [start_seed, start_seed + scenario_num]. When being forced to reset with a specific seed, that seed must be within the predefined range.
- Parameters
seed (
Optional[int]) – the seed that the environment will be reset with. If it is specified as None, a random seed within the range will be selected by the underlying MetaDrive environment.
- load(env_name='Vectorized', env_id=0, traffic_density=0.1, start_seed=3844, scenario_num=5000, decision_repeat=5, map_spec=4, crash_penalty=5.0, speed_reward_weight=0.1, success_reward=10.0, time_limit=1200)[source]#
Load the MetaDrive environment and wraps it with AlfMetaDriveWrapper. :type env_name:
str:param env_name: Used to specify whether the environment produces observationin vectorized form or raster (Bird Eye View) form. The user is only allowed to specify “Vectorized” or “BirdEye”.
- Parameters
env_id (
int) – (optional) ID of the environment.traffic_density (
float) – number of traffic vehicles per 10 meter per lane.start_seed (
int) – random seed of the first map.scenario_num (
int) – specifies the range of the scenario seeds together withstart_seed. When being reset, a seed will be picked randomly from [start_seed, start_seed + scenario_num]. Note that even with the same seed, the generated map can vary as there are other randomness such as “random lane number”.decision_repeat (
int) – how many times for the simulation engine to repeat the applied action to the vehicles. The minimal simulation interval physics_world_step_size is 0.02 s. Therefore each RL step will last decision_repeat * 0.02 s in the simulation world.map_spec (
Union[int,str]) – User can set a string or int as the key to generate map in an easy way. For example, config[“map”] = 3 means generating a map containing 3 blocks, while config[“map”] = “SCrRX” means the first block is Straight, and the following blocks are Circular, InRamp, OutRamp and Intersection. The character here are the unique ID of different types of blocks as shown in the next table. Therefore using a string can determine the block type sequence. Detailed list of block types can be found at https://metadrive-simulator.readthedocs.io/en/latest/config_system.htmlcrash_penalty (
float) – the immediate penalty when the car hits the road boundary, cars or other objects. It should be a positive number.speed_reward_weight (
float) – at each step, the incentive reward for being at a high speed is this weight * the speed in km/h.success_reward (
float) – the amount of reward will be given (at most 1 time per episode) when the ego car reaches the destination.time_limit (
int) – the environment will terminate the an episode if it goes beyond this number of steps.
alf.environments.suite_procgen#
alf.environments.suite_robotics#
alf.environments.suite_safety_gym#
alf.environments.suite_simple#
Suite for simple environments defined by ALF
- load(game, env_id=None, env_args={}, discount=1.0, frame_skip=None, frame_stack=None, gym_env_wrappers=(), alf_env_wrappers=(), max_episode_steps=0)[source]#
Loads the specified simple game and wraps it. :param game: name for the environment to load. The game should have been
defined in the sub-directory
./simple/.- Parameters
env_args (dict) – extra args for creating the game.
discount (float) – discount to use for the environment.
frame_skip (int) – the time interval at which the agent experiences the game.
frame_stack (int) – stack so many latest frames as the observation input.
gym_env_wrappers (list) – list of gym env wrappers.
alf_env_wrappers (list) – list of ALF env wrappers.
max_episode_steps (int) – max number of steps for an episode.
- Returns
An AlfEnvironment instance.
alf.environments.suite_socialbot#
- load(environment_name, env_id=None, port=None, wrap_with_process=False, discount=1.0, max_episode_steps=None, gym_env_wrappers=(), alf_env_wrappers=())[source]#
Loads the selected environment and wraps it with the specified wrappers.
Note that by default a TimeLimit wrapper is used to limit episode lengths to the default benchmarks defined by the registered environments.
- Parameters
environment_name (str) – Name for the environment to load.
env_id (int) – (optional) ID of the environment.
port (int) – Port used for the environment
wrap_with_process (bool) – Whether wrap environment in a new process
discount (float) – Discount to use for the environment.
max_episode_steps (int) – If None the max_episode_steps will be set to the default step limit defined in the environment’s spec. No limit is applied if set to 0 or if there is no timestep_limit set in the environment’s spec.
gym_env_wrappers (Iterable) – Iterable with references to gym_wrappers, classes to use directly on the gym environment.
alf_env_wrappers (Iterable) – Iterable with references to alf_wrappers classes to use on the ALF environment.
- Returns
An AlfEnvironmentBase instance.
alf.environments.suite_tic_tac_toe#
- class TicTacToeEnvironment(batch_size)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentA Simple 3x3 board game.
For two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal line is the winner.
The reward is +1 if player 0 win, -1 if player 1 win and 0 for draw. An invalid move will give the reward for the opponent.
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode)[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
alf.environments.suite_unittest#
Environments for unittest.
- class MixedPolicyUnittestEnv(batch_size, episode_length, obs_dim=1)[source]#
Bases:
alf.environments.suite_unittest.UnittestEnvEnvironment for testing a mixed policy.
Given the agent’s (discrete, continuous) action pair
(a_d, a_c), if'a_d == (a_c > 0.5), the agent receives a reward of 1; otherwise it receives 0.Initializes the environment.
- Parameters
batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
- class PolicyUnittestEnv(batch_size, episode_length, obs_dim=1, action_type=<ActionType.Discrete: 1>, reward_dim=1)[source]#
Bases:
alf.environments.suite_unittest.UnittestEnvEnvironment for testing policy.
The agent receives 1-diff(action, observation) as reward
Initializes the environment.
- Parameters
batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType
- class RNNPolicyUnittestEnv(batch_size, episode_length, gap=3, action_type=<ActionType.Discrete: 1>, obs_dim=1)[source]#
Bases:
alf.environments.suite_unittest.UnittestEnvEnvironment for testing RNN policy.
The agent receives reward 1 after initial gap steps if its actions action match the observation given at the first step.
Initializes the environment.
- Parameters
batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType
- class UnittestEnv(batch_size, episode_length, obs_dim=1, action_type=<ActionType.Discrete: 1>, reward_dim=1)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentAbstract base for unittest environment.
Every episode ends in episode_length steps (including LAST step). The observation is one dimensional. The action is binary {0, 1} when action_type is ActionType.Discrete
and a float value in range (0.0, 1.0) when action_type is ActionType.Continuous
Initializes the environment.
- Parameters
batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- class ValueUnittestEnv(batch_size, episode_length, obs_dim=1, action_type=<ActionType.Discrete: 1>, reward_dim=1)[source]#
Bases:
alf.environments.suite_unittest.UnittestEnvEnvironment for testing value estimation.
Every episode ends in episode_length steps. It always give reward 1 at each step.
Initializes the environment.
- Parameters
batch_size (int) – The batch size expected for the actions and observations.
episode_length (int) – length of each episode
action_type (nest) – ActionType
alf.environments.thread_environment#
Runs a single environments in a separate thread.
- class ThreadEnvironment(env_constructor)[source]#
Bases:
alf.environments.alf_environment.AlfEnvironmentCreate, Step a single env in a separate thread
Create a ThreadEnvironment
- Parameters
env_constructor (Callable) – env_constructor for the OpenAI Gym environment
- action_spec()[source]#
Defines the actions that should be provided to
step().May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- property batch_size#
The batch size of the environment.
- Returns
The batch size of the environment, or 1 if the environment is not batched.
- Raises
RuntimeError – If a subclass overrode batched to return True but did not override the
batch_sizeproperty.
- property batched#
Whether the environment is batched or not.
If the environment supports batched observations and actions, then overwrite this property to True.
A batched environment takes in a batched set of actions and returns a batched set of observations. This means for all numpy arrays in the input and output nested structures, the first dimension is the batch size.
When batched, the left-most dimension is not part of the action_spec or the observation_spec and corresponds to the batch dimension.
- Returns
A boolean indicating whether the environment is batched or not.
- close()[source]#
Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method can be used directly:
env = Env(...) # Use env. env.close()
or via a context manager:
with Env(...) as env: # Use env.
- observation_spec()[source]#
Defines the observations provided by the environment.
May use a subclass of
TensorSpecthat specifies additional properties such as min and max bounds on the values.- Returns
nested TensorSpec
- render(mode='rgb_array')[source]#
Renders the environment.
- Parameters
mode – One of [‘rgb_array’, ‘human’]. Renders to an numpy array, or brings up a window where the environment can be visualized.
- Returns
An ndarray of shape
[width, height, 3]denoting an RGB image if mode isrgb_array. Otherwise return nothing and render directly to a display window.- Raises
NotImplementedError – If the environment does not support rendering.
alf.environments.utils#
- class UnwrappedEnvChecker[source]#
Bases:
objectA class for checking if there is already an unwrapped env in the current process. For some games, if the check is True, then we should stop creating more envs (multiple envs cannot coexist in a process).
See
suite_socialbot.pyfor an example usage of this class.
- create_environment(env_name='CartPole-v0', env_load_fn=<function load>, eval_env_load_fn=None, for_evaluation=False, num_parallel_environments=30, batch_size_per_env=1, nonparallel=False, flatten=True, start_serially=True, num_spare_envs=0, parallel_environment_ctor=<class 'alf.environments.fast_parallel_environment.FastParallelEnvironment'>, seed=None, batched_wrappers=())[source]#
Create a batched environment.
- Parameters
env_name (str|list[str]) – env name. If it is a list,
MultitaskWrapperwill be used to create multi-task environments. Each one of them consists of the environments listed inenv_name.env_load_fn (Callable) – callable that create an environment If env_load_fn has attribute
batchedand it is True,evn_load_fn(env_name, batch_size=num_parallel_environments)will be used to create the batched environment. Otherwise, aParallAlfEnvironmentwill be created.eval_env_load_fn (Callable) – callable that create an environment for evaluation. If None, use
env_load_fn. This argument is useful for cases when the evaluation environment is different from the training environment.for_evaluation (bool) – whether to create an environment for evaluation (if True) or for training (if False). If True,
eval_env_load_fnwill be used for creating the environment if provided. Otherwise,env_load_fnwill be used.num_parallel_environments (int) – num of parallel environments
batch_size_per_env (int) – if >1, will create
num_parallel_environments/batch_size_per_envProcessEnvironment. Each of theseProcessEnvironmentholdsbatch_size_per_envenvironments. If each underlying environment ofProcessEnvironmentis itself batched,batch_size_per_envwill be used as the batch size for them. OtherwiseBatchEnvironmentWrapperwill be sused to instruct each process to run the underlying environments sequentially on operations such asstep(). The potential benefit of usingbatch_size_per_env>1is to reduce the number of processes being used, or to take advantages of the batched nature of the underlying environment.num_spare_envs (int) – num of spare parallel envs for speed up reset.
nonparallel (bool) – force to create a single env in the current process. Used for correctly exposing game gin confs to tensorboard.
start_serially (bool) – start environments serially or in parallel.
flatten (bool) – whether to use flatten action and time_steps during communication to reduce overhead.
num_spare_envs – number of spare parallel environments to speed up reset. Useful when a reset is much slower than a regular step.
parallel_environment_ctor (Callable) – used to contruct parallel environment. Available constructors are:
fast_parallel_environment.FastParallelEnvironmentandparallel_environment.ParallelAlfEnvironment.seed (None|int) – random number seed for environment. A random seed is used if None.
batched_wrappers (Iterable) – a list of wrappers which can wrap batched AlfEnvironment.
- Returns
- Return type
- load_with_random_max_episode_steps(env_name, env_load_fn=<function load>, min_steps=200, max_steps=250)[source]#
Create environment with random max_episode_steps in range
[min_steps, max_steps].- Parameters
env_name (str) – env name
env_load_fn (Callable) – callable that create an environment
min_steps (int) – represent min value of the random range
max_steps (int) – represent max value of the random range
- Returns
- Return type