TD3¶
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | from rlzoo.common.env_wrappers import build_env
from rlzoo.common.utils import call_default_params
from rlzoo.algorithms import TD3
AlgName = 'TD3'
EnvName = 'Pendulum-v0' # only continuous action
EnvType = 'classic_control'
# EnvName = 'BipedalWalker-v2'
# EnvType = 'box2d'
# EnvName = 'Ant-v2'
# EnvType = 'mujoco'
# EnvName = 'FetchPush-v1'
# EnvType = 'robotics'
# EnvName = 'FishSwim-v0'
# EnvType = 'dm_control'
# EnvName = 'ReachTarget'
# EnvType = 'rlbench'
env = build_env(EnvName, EnvType)
alg_params, learn_params = call_default_params(env, EnvType, AlgName)
alg = eval(AlgName+'(**alg_params)')
alg.learn(env=env, mode='train', render=False, **learn_params)
alg.learn(env=env, mode='test', render=True, **learn_params)
|
Twin Delayed DDPG¶
-
class
rlzoo.algorithms.td3.td3.
TD3
(net_list, optimizers_list, replay_buffer_capacity=500000.0, policy_target_update_interval=5)[source]¶ twin-delayed ddpg
-
evaluate
(state, eval_noise_scale, target=False)[source]¶ generate action with state for calculating gradients;
Parameters: eval_noise_scale – as the trick of target policy smoothing, for generating noisy actions.
-
get_action
(state, explore_noise_scale)[source]¶ generate action with state for interaction with envronment
-
learn
(env, train_episodes=1000, test_episodes=1000, max_steps=150, batch_size=64, explore_steps=500, update_itr=3, reward_scale=1.0, save_interval=10, explore_noise_scale=1.0, eval_noise_scale=0.5, mode='train', render=False, plot_func=None)[source]¶ Parameters: - env – learning environment
- train_episodes – total number of episodes for training
- test_episodes – total number of episodes for testing
- max_steps – maximum number of steps for one episode
- batch_size – udpate batchsize
- explore_steps – for random action sampling in the beginning of training
- update_itr – repeated updates for single step
- reward_scale – value range of reward
- save_interval – timesteps for saving the weights and plotting the results
- explore_noise_scale – range of action noise for exploration
- eval_noise_scale – range of action noise for evaluation of action value
- mode – ‘train’ or ‘test’
- render – if true, visualize the environment
- plot_func – additional function for interactive module
-