A3C¶
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | from rlzoo.common.env_wrappers import build_env
from rlzoo.common.utils import call_default_params
from rlzoo.algorithms import A3C
AlgName = 'A3C'
EnvName = 'PongNoFrameskip-v4'
EnvType = 'atari'
# EnvName = 'Pendulum-v0' # only continuous action
# EnvType = 'classic_control'
# EnvName = 'BipedalWalker-v2'
# EnvType = 'box2d'
# EnvName = 'Ant-v2'
# EnvType = 'mujoco'
# EnvName = 'FetchPush-v1'
# EnvType = 'robotics'
# EnvName = 'FishSwim-v0'
# EnvType = 'dm_control'
number_workers = 2 # need to specify number of parallel workers
env = build_env(EnvName, EnvType, nenv=number_workers)
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=False, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)
|
Asychronous Advantage Actor-Critic¶
-
class
rlzoo.algorithms.a3c.a3c.
A3C
(net_list, optimizers_list, entropy_beta=0.005)[source]¶ -
learn
(env, train_episodes=1000, test_episodes=10, max_steps=150, render=False, n_workers=1, update_itr=10, gamma=0.99, save_interval=500, mode='train', plot_func=None)[source]¶ Parameters: - env – a list of same learning environments
- train_episodes – total number of episodes for training
- test_episodes – total number of episodes for testing
- max_steps – maximum number of steps for one episode
- render – render or not
- n_workers – manually set number of workers
- update_itr – update global policy after several episodes
- gamma – reward discount factor
- save_interval – timesteps for saving the weights and plotting the results
- mode – train or test
- plot_func – additional function for interactive module
-