Environment Wrappers

Environment Wrappers in RLzoo

Env wrappers Most common wrappers can be checked from following links for usage:



rlzoo.common.env_wrappers.build_env(env_id, env_type, vectorized=False, seed=0, reward_shaping=None, nenv=1, **kwargs)[source]

Build env based on options

  • env_id – (str) environment id
  • env_type – (str) atari, classic_control, box2d
  • vectorized – (bool) whether sampling parrallel
  • seed – (int) random seed for env
  • reward_shaping – (callable) callable function for reward shaping
  • nenv – (int) how many processes will be used in sampling
  • kwargs – (dict)
  • max_episode_steps – (int) the maximum episode steps
class rlzoo.common.env_wrappers.TimeLimit(env, max_episode_steps=None)[source]

Bases: gym.core.Wrapper


Resets the state of the environment and returns an initial observation.

observation (object): the initial observation.

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.NoopResetEnv(env, noop_max=30)[source]

Bases: gym.core.Wrapper


Do no-op action for a number of steps in [1, noop_max].


Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.FireResetEnv(env)[source]

Bases: gym.core.Wrapper


Resets the state of the environment and returns an initial observation.

observation (object): the initial observation.

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.EpisodicLifeEnv(env)[source]

Bases: gym.core.Wrapper


Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.


Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.MaxAndSkipEnv(env, skip=4)[source]

Bases: gym.core.Wrapper


Resets the state of the environment and returns an initial observation.

observation (object): the initial observation.

Repeat action, sum reward, and max over last observations.

class rlzoo.common.env_wrappers.ClipRewardEnv(env)[source]

Bases: gym.core.RewardWrapper


Bin reward to {+1, 0, -1} by its sign.

class rlzoo.common.env_wrappers.WarpFrame(env, width=84, height=84, grayscale=True)[source]

Bases: gym.core.ObservationWrapper

class rlzoo.common.env_wrappers.FrameStack(env, k)[source]

Bases: gym.core.Wrapper


Resets the state of the environment and returns an initial observation.

observation (object): the initial observation.

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.LazyFrames(frames)[source]

Bases: object

class rlzoo.common.env_wrappers.RewardShaping(env, func)[source]

Bases: gym.core.RewardWrapper

Shaping the reward For reward scale, func can be lambda r: r * scale

class rlzoo.common.env_wrappers.SubprocVecEnv(env_fns)[source]

Bases: object


Reset all the environments and return an array of observations, or a tuple of observation arrays. If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

class rlzoo.common.env_wrappers.VecFrameStack(env, k)[source]

Bases: object

class rlzoo.common.env_wrappers.Monitor(env, info_keywords=None)[source]

Bases: gym.core.Wrapper


Resets the state of the environment and returns an initial observation.

observation (object): the initial observation.

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.NormalizedActions(env)[source]

Bases: gym.core.ActionWrapper

class rlzoo.common.env_wrappers.DmObsTrans(env)[source]

Bases: gym.core.Wrapper

Observation process for DeepMind Control Suite environments


Resets the state of the environment and returns an initial observation.

observation (object): the initial observation.

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

action (object): an action provided by the agent
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)