Environment Wrappers¶
Environment Wrappers in RLzoo¶
Env wrappers Most common wrappers can be checked from following links for usage:
https://pypi.org/project/gym-vec-env
https://github.com/openai/baselines/blob/master/baselines/common/wrappers.py
-
rlzoo.common.env_wrappers.
build_env
(env_id, env_type, vectorized=False, seed=0, reward_shaping=None, nenv=1, **kwargs)[source]¶ Build env based on options
Parameters: - env_id – (str) environment id
- env_type – (str) atari, classic_control, box2d
- vectorized – (bool) whether sampling parrallel
- seed – (int) random seed for env
- reward_shaping – (callable) callable function for reward shaping
- nenv – (int) how many processes will be used in sampling
- kwargs – (dict)
- max_episode_steps – (int) the maximum episode steps
-
class
rlzoo.common.env_wrappers.
TimeLimit
(env, max_episode_steps=None)[source]¶ Bases:
gym.core.Wrapper
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns:
- observation (object): the initial observation.
-
step
(ac)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
rlzoo.common.env_wrappers.
NoopResetEnv
(env, noop_max=30)[source]¶ Bases:
gym.core.Wrapper
-
step
(ac)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
rlzoo.common.env_wrappers.
FireResetEnv
(env)[source]¶ Bases:
gym.core.Wrapper
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns:
- observation (object): the initial observation.
-
step
(ac)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
rlzoo.common.env_wrappers.
EpisodicLifeEnv
(env)[source]¶ Bases:
gym.core.Wrapper
-
reset
(**kwargs)[source]¶ Reset only when lives are exhausted. This way all states are still reachable even though lives are episodic, and the learner need not know about any of this behind-the-scenes.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
rlzoo.common.env_wrappers.
MaxAndSkipEnv
(env, skip=4)[source]¶ Bases:
gym.core.Wrapper
-
class
rlzoo.common.env_wrappers.
WarpFrame
(env, width=84, height=84, grayscale=True)[source]¶ Bases:
gym.core.ObservationWrapper
-
class
rlzoo.common.env_wrappers.
FrameStack
(env, k)[source]¶ Bases:
gym.core.Wrapper
-
reset
()[source]¶ Resets the state of the environment and returns an initial observation.
- Returns:
- observation (object): the initial observation.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
rlzoo.common.env_wrappers.
RewardShaping
(env, func)[source]¶ Bases:
gym.core.RewardWrapper
Shaping the reward For reward scale, func can be lambda r: r * scale
-
class
rlzoo.common.env_wrappers.
SubprocVecEnv
(env_fns)[source]¶ Bases:
object
-
class
rlzoo.common.env_wrappers.
Monitor
(env, info_keywords=None)[source]¶ Bases:
gym.core.Wrapper
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns:
- observation (object): the initial observation.
-
step
(action)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-
-
class
rlzoo.common.env_wrappers.
DmObsTrans
(env)[source]¶ Bases:
gym.core.Wrapper
Observation process for DeepMind Control Suite environments
-
reset
(**kwargs)[source]¶ Resets the state of the environment and returns an initial observation.
- Returns:
- observation (object): the initial observation.
-
step
(ac)[source]¶ Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
- action (object): an action provided by the agent
- Returns:
- observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
-