Environment Wrappers

Environment Wrappers in RLzoo

Env wrappers Most common wrappers can be checked from following links for usage:

https://pypi.org/project/gym-vec-env

https://github.com/openai/baselines/blob/master/baselines/common/wrappers.py

rlzoo.common.env_wrappers.build_env(env_id, env_type, vectorized=False, seed=0, reward_shaping=None, nenv=1, **kwargs)[source]

Build env based on options

Parameters:
  • env_id – (str) environment id
  • env_type – (str) atari, classic_control, box2d
  • vectorized – (bool) whether sampling parrallel
  • seed – (int) random seed for env
  • reward_shaping – (callable) callable function for reward shaping
  • nenv – (int) how many processes will be used in sampling
  • kwargs – (dict)
  • max_episode_steps – (int) the maximum episode steps
class rlzoo.common.env_wrappers.TimeLimit(env, max_episode_steps=None)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.NoopResetEnv(env, noop_max=30)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Do no-op action for a number of steps in [1, noop_max].

step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.FireResetEnv(env)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.EpisodicLifeEnv(env)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.MaxAndSkipEnv(env, skip=4)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
step(action)[source]

Repeat action, sum reward, and max over last observations.

class rlzoo.common.env_wrappers.ClipRewardEnv(env)[source]

Bases: gym.core.RewardWrapper

reward(reward)[source]

Bin reward to {+1, 0, -1} by its sign.

class rlzoo.common.env_wrappers.WarpFrame(env, width=84, height=84, grayscale=True)[source]

Bases: gym.core.ObservationWrapper

observation(frame)[source]
class rlzoo.common.env_wrappers.FrameStack(env, k)[source]

Bases: gym.core.Wrapper

reset()[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.LazyFrames(frames)[source]

Bases: object

class rlzoo.common.env_wrappers.RewardShaping(env, func)[source]

Bases: gym.core.RewardWrapper

Shaping the reward For reward scale, func can be lambda r: r * scale

reward(reward)[source]
class rlzoo.common.env_wrappers.SubprocVecEnv(env_fns)[source]

Bases: object

close()[source]
reset()[source]

Reset all the environments and return an array of observations, or a tuple of observation arrays. If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step(actions)[source]
class rlzoo.common.env_wrappers.VecFrameStack(env, k)[source]

Bases: object

reset()[source]
step(action)[source]
class rlzoo.common.env_wrappers.Monitor(env, info_keywords=None)[source]

Bases: gym.core.Wrapper

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
class rlzoo.common.env_wrappers.NormalizedActions(env)[source]

Bases: gym.core.ActionWrapper

class rlzoo.common.env_wrappers.DmObsTrans(env)[source]

Bases: gym.core.Wrapper

Observation process for DeepMind Control Suite environments

reset(**kwargs)[source]

Resets the state of the environment and returns an initial observation.

Returns:
observation (object): the initial observation.
step(ac)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:
action (object): an action provided by the agent
Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)