VPG

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 from rlzoo.common.env_wrappers import build_env
 from rlzoo.common.utils import call_default_params
 from rlzoo.algorithms import PG

 AlgName = 'PG'
 EnvName = 'PongNoFrameskip-v4'
 EnvType = 'atari'

 # EnvName = 'CartPole-v0'
 # EnvType = 'classic_control'

 # EnvName = 'BipedalWalker-v2'
 # EnvType = 'box2d'

 # EnvName = 'Ant-v2'
 # EnvType = 'mujoco'

 # EnvName = 'FetchPush-v1'
 # EnvType = 'robotics'

 # EnvName = 'FishSwim-v0'
 # EnvType = 'dm_control'

 # EnvName = 'ReachTarget'
 # EnvType = 'rlbench'

 env = build_env(EnvName, EnvType)
 alg_params, learn_params = call_default_params(env, EnvType, AlgName)
 alg = eval(AlgName+'(**alg_params)')
 alg.learn(env=env, mode='train', render=False, **learn_params)
 alg.learn(env=env, mode='test', render=True, **learn_params)

Vanilla Policy Gradient

class rlzoo.algorithms.pg.pg.PG(net_list, optimizers_list)[source]

PG class

get_action(s)[source]

choose action with probabilities.

Parameters:s – state
Returns:act
get_action_greedy(s)[source]

choose action with greedy policy

Parameters:s – state
Returns:act
learn(env, train_episodes=200, test_episodes=100, max_steps=200, save_interval=100, mode='train', render=False, gamma=0.95, plot_func=None)[source]
Parameters:
  • env – learning environment
  • train_episodes – total number of episodes for training
  • test_episodes – total number of episodes for testing
  • max_steps – maximum number of steps for one episode
  • save_interval – time steps for saving
  • mode – train or test
  • render – render each step
  • gamma – reward decay
  • plot_func – additional function for interactive module
Returns:

None

load_ckpt(env_name)[source]

load trained weights

Returns:None
save_ckpt(env_name)[source]

save trained weights

Returns:None
store_transition(s, a, r)[source]

store data in memory buffer

Parameters:
  • s – state
  • a – act
  • r – reward
Returns:

update(gamma)[source]

update policy parameters via stochastic gradient ascent

Returns:None

Default Hyper-parameters

rlzoo.algorithms.pg.default.atari(env, default_seed=True)[source]
rlzoo.algorithms.pg.default.box2d(env, default_seed=True)[source]
rlzoo.algorithms.pg.default.classic_control(env, default_seed=True)[source]
rlzoo.algorithms.pg.default.dm_control(env, default_seed=True)[source]
rlzoo.algorithms.pg.default.mujoco(env, default_seed=True)[source]
rlzoo.algorithms.pg.default.rlbench(env, default_seed=True)[source]
rlzoo.algorithms.pg.default.robotics(env, default_seed=True)[source]