Quick Start

Simple Usage

Open ./run_rlzoo.py:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
 from rlzoo.common.env_wrappers import build_env
 from rlzoo.common.utils import call_default_params
 from rlzoo.algorithms import TD3
 # choose an algorithm
 AlgName = 'TD3'
 # select a corresponding environment type
 EnvType = 'classic_control'
 # chose an environment
 EnvName = 'Pendulum-v0'
 # build an environment with wrappers
 env = build_env(EnvName, EnvType)
 # call default parameters for the algorithm and learning process
 alg_params, learn_params = call_default_params(env, EnvType, AlgName)
 # instantiate the algorithm
 alg = eval(AlgName+'(**alg_params)')
 # start the training
 alg.learn(env=env, mode='train', render=False, **learn_params)
 # test after training
 alg.learn(env=env, mode='test', render=True, **learn_params)

Run the example:

python run_rlzoo.py

Choices for AlgName: ‘DQN’, ‘AC’, ‘A3C’, ‘DDPG’, ‘TD3’, ‘SAC’, ‘PG’, ‘TRPO’, ‘PPO’, ‘DPPO’

Choices for EnvType: ‘atari’, ‘box2d’, ‘classic_control’, ‘mujoco’, ‘robotics’, ‘dm_control’, ‘rlbench’

Choices for EnvName refers to List of Supported Environments in RLzoo

Another Usage

For providing more flexibility, we provide another usage example of RLzoo with more explicit configurations as follows, where the users can pass in customized networks and otpimizers, etc.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
 import gym
 from rlzoo.common.utils import make_env, set_seed
 from rlzoo.algorithms import AC
 from rlzoo.common.value_networks import ValueNetwork
 from rlzoo.common.policy_networks import StochasticPolicyNetwork

 ''' load environment '''
 env = gym.make('CartPole-v0').unwrapped
 obs_space = env.observation_space
 act_space = env.action_space
 # reproducible
 seed = 2
 set_seed(seed, env)

 ''' build networks for the algorithm '''
 num_hidden_layer = 4 #number of hidden layers for the networks
 hidden_dim = 64 # dimension of hidden layers for the networks
 with tf.name_scope('AC'):
         with tf.name_scope('Critic'):
                 # choose the critic network, can be replaced with customized network
                 critic = ValueNetwork(obs_space, hidden_dim_list=num_hidden_layer * [hidden_dim])
         with tf.name_scope('Actor'):
                 # choose the actor network, can be replaced with customized network
                 actor = StochasticPolicyNetwork(obs_space, act_space, hidden_dim_list=num_hidden_layer * [hidden_dim], output_activation=tf.nn.tanh)
 net_list = [actor, critic] # list of the networks

 ''' choose optimizers '''
 a_lr, c_lr = 1e-4, 1e-2  # a_lr: learning rate of the actor; c_lr: learning rate of the critic
 a_optimizer = tf.optimizers.Adam(a_lr)
 c_optimizer = tf.optimizers.Adam(c_lr)
 optimizers_list=[a_optimizer, c_optimizer]  # list of optimizers

 # intialize the algorithm model, with algorithm parameters passed in
 model = AC(net_list, optimizers_list)
 '''
 full list of arguments for the algorithm
 ----------------------------------------
 net_list: a list of networks (value and policy) used in the algorithm, from common functions or customization
 optimizers_list: a list of optimizers for all networks and differentiable variables
 gamma: discounted factor of reward
 action_range: scale of action values
 '''

 # start the training process, with learning parameters passed in
 model.learn(env, train_episodes=500,  max_steps=200,
             save_interval=50, mode='train', render=False)
 '''
 full list of parameters for training
 -------------------------------------
 env: learning environment
 train_episodes:  total number of episodes for training
 test_episodes:  total number of episodes for testing
 max_steps:  maximum number of steps for one episode
 save_interval: time steps for saving the weights and plotting the results
 mode: 'train' or 'test'
 render:  if true, visualize the environment
 '''

 # test after training
 model.learn(env, test_episodes=100, max_steps=200,  mode='test', render=True)

Interactive Configurations

We also provide an interactive learning configuration with Jupyter Notebook and ipywidgets, where you can select the algorithm, environment, and general learning settings with simple clicking on dropdown lists and sliders! A video demonstrating the usage is as following. The interactive mode can be used with rlzoo/interactive/main.ipynb by running $ jupyter notebook to open it.

../_images/interactive.gif