Environment Wrappers¶

Environment Wrappers in RLzoo¶

Env wrappers Most common wrappers can be checked from following links for usage:

https://pypi.org/project/gym-vec-env

https://github.com/openai/baselines/blob/master/baselines/common/wrappers.py

rlzoo.common.env_wrappers.build_env(env_id, env_type, vectorized=False, seed=0, reward_shaping=None, nenv=1, **kwargs)[source]¶

Build env based on options

Parameters:

env_id – (str) environment id
env_type – (str) atari, classic_control, box2d
vectorized – (bool) whether sampling parrallel
seed – (int) random seed for env
reward_shaping – (callable) callable function for reward shaping
nenv – (int) how many processes will be used in sampling
kwargs – (dict)
max_episode_steps – (int) the maximum episode steps

class rlzoo.common.env_wrappers.TimeLimit(env, max_episode_steps=None)[source]¶

Bases: gym.core.Wrapper

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(ac)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rlzoo.common.env_wrappers.NoopResetEnv(env, noop_max=30)[source]¶

Bases: gym.core.Wrapper

reset(**kwargs)[source]¶: Do no-op action for a number of steps in [1, noop_max].

step(ac)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rlzoo.common.env_wrappers.FireResetEnv(env)[source]¶

Bases: gym.core.Wrapper

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(ac)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rlzoo.common.env_wrappers.EpisodicLifeEnv(env)[source]¶

Bases: gym.core.Wrapper

reset(**kwargs)[source]¶: Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rlzoo.common.env_wrappers.MaxAndSkipEnv(env, skip=4)[source]¶

Bases: gym.core.Wrapper

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(action)[source]¶: Repeat action, sum reward, and max over last observations.

class rlzoo.common.env_wrappers.ClipRewardEnv(env)[source]¶

Bases: gym.core.RewardWrapper

reward(reward)[source]¶: Bin reward to {+1, 0, -1} by its sign.

class rlzoo.common.env_wrappers.WarpFrame(env, width=84, height=84, grayscale=True)[source]¶

Bases: gym.core.ObservationWrapper

observation(frame)[source]¶

class rlzoo.common.env_wrappers.FrameStack(env, k)[source]¶

Bases: gym.core.Wrapper

reset()[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rlzoo.common.env_wrappers.LazyFrames(frames)[source]¶: Bases: object

class rlzoo.common.env_wrappers.RewardShaping(env, func)[source]¶

Bases: gym.core.RewardWrapper

Shaping the reward For reward scale, func can be lambda r: r * scale

reward(reward)[source]¶

class rlzoo.common.env_wrappers.SubprocVecEnv(env_fns)[source]¶

Bases: object

close()[source]¶

reset()[source]¶: Reset all the environments and return an array of observations, or a tuple of observation arrays. If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

step(actions)[source]¶

class rlzoo.common.env_wrappers.VecFrameStack(env, k)[source]¶

Bases: object

reset()[source]¶

step(action)[source]¶

class rlzoo.common.env_wrappers.Monitor(env, info_keywords=None)[source]¶

Bases: gym.core.Wrapper

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(action)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

class rlzoo.common.env_wrappers.NormalizedActions(env)[source]¶: Bases: gym.core.ActionWrapper

class rlzoo.common.env_wrappers.DmObsTrans(env)[source]¶

Bases: gym.core.Wrapper

Observation process for DeepMind Control Suite environments

reset(**kwargs)[source]¶

Resets the state of the environment and returns an initial observation.

Returns:: observation (object): the initial observation.

step(ac)[source]¶

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:: action (object): an action provided by the agent
Returns:: observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)