Policy Networks¶
Policy Networks in RLzoo¶
-
class
rlzoo.common.policy_networks.
StochasticContinuousPolicyNetwork
(state_shape, action_shape, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=None, log_std_min=-20, log_std_max=2, trainable=True)[source]¶ Bases:
tensorlayer.models.core.Model
-
__init__
(state_shape, action_shape, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=None, log_std_min=-20, log_std_max=2, trainable=True)[source]¶ Stochastic continuous policy network with multiple fully-connected layers or convolutional layers (according to state shape)
Parameters: - state_shape – (tuple[int]) shape of the state, for example, (state_dim, ) for single-dimensional state
- action_shape – (tuple[int]) shape of the action, for example, (action_dim, ) for single-dimensional action
- hidden_dim_list – (list[int]) a list of dimensions of hidden layers
- w_init – (callable) weights initialization
- activation – (callable) activation function
- output_activation – (callable or None) output activation function
- log_std_min – (float) lower bound of standard deviation of action
- log_std_max – (float) upper bound of standard deviation of action
- trainable – (bool) set training and evaluation mode
-
__module__
= 'rlzoo.common.policy_networks'¶
-
-
class
rlzoo.common.policy_networks.
DeterministicContinuousPolicyNetwork
(state_shape, action_shape, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=<function tanh>, trainable=True)[source]¶ Bases:
tensorlayer.models.core.Model
-
__init__
(state_shape, action_shape, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=<function tanh>, trainable=True)[source]¶ Deterministic continuous policy network with multiple fully-connected layers or convolutional layers (according to state shape)
Parameters: - state_shape – (tuple[int]) shape of the state, for example, (state_dim, ) for single-dimensional state
- action_shape – (tuple[int]) shape of the action, for example, (action_dim, ) for single-dimensional action
- hidden_dim_list – (list[int]) a list of dimensions of hidden layers
- w_init – (callable) weights initialization
- activation – (callable) activation function
- output_activation – (callable or None) output activation function
- trainable – (bool) set training and evaluation mode
-
__module__
= 'rlzoo.common.policy_networks'¶
-
-
class
rlzoo.common.policy_networks.
DeterministicPolicyNetwork
(state_space, action_space, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=<function tanh>, trainable=True, name=None)[source]¶ Bases:
tensorlayer.models.core.Model
-
__call__
(states, *args, **kwargs)[source]¶ Forward input tensors through this network by calling.
- inputs : Tensor or list of Tensors, numpy.ndarray of list of numpy.ndarray
- Inputs for network forwarding
- is_train : boolean
- Network’s mode for this time forwarding. If ‘is_train’ == True, this network is set as training mode. If ‘is_train’ == False, this network is set as evaluation mode
- kwargs :
- For other keyword-only arguments.
-
__init__
(state_space, action_space, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=<function tanh>, trainable=True, name=None)[source]¶ Deterministic continuous/discrete policy network with multiple fully-connected layers
Parameters: - state_space – (gym.spaces) space of the state from gym environments
- action_space – (gym.spaces) space of the action from gym environments
- hidden_dim_list – (list[int]) a list of dimensions of hidden layers
- w_init – (callable) weights initialization
- activation – (callable) activation function
- output_activation – (callable or None) output activation function
- trainable – (bool) set training and evaluation mode
-
__module__
= 'rlzoo.common.policy_networks'¶
-
action_shape
¶
-
action_space
¶
-
state_shape
¶
-
state_space
¶
-
-
class
rlzoo.common.policy_networks.
StochasticPolicyNetwork
(state_space, action_space, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=<function tanh>, log_std_min=-20, log_std_max=2, trainable=True, name=None, state_conditioned=False)[source]¶ Bases:
tensorlayer.models.core.Model
-
__call__
(states, *args, greedy=False, **kwargs)[source]¶ Forward input tensors through this network by calling.
- inputs : Tensor or list of Tensors, numpy.ndarray of list of numpy.ndarray
- Inputs for network forwarding
- is_train : boolean
- Network’s mode for this time forwarding. If ‘is_train’ == True, this network is set as training mode. If ‘is_train’ == False, this network is set as evaluation mode
- kwargs :
- For other keyword-only arguments.
-
__init__
(state_space, action_space, hidden_dim_list, w_init=<tensorflow.python.ops.init_ops_v2.GlorotNormal object>, activation=<function relu>, output_activation=<function tanh>, log_std_min=-20, log_std_max=2, trainable=True, name=None, state_conditioned=False)[source]¶ Stochastic continuous/discrete policy network with multiple fully-connected layers
Parameters: - state_space – (gym.spaces) space of the state from gym environments
- action_space – (gym.spaces) space of the action from gym environments
- hidden_dim_list – (list[int]) a list of dimensions of hidden layers
- w_init – (callable) weights initialization
- activation – (callable) activation function
- output_activation – (callable or None) output activation function
- log_std_min – (float) lower bound of standard deviation of action
- log_std_max – (float) upper bound of standard deviation of action
- trainable – (bool) set training and evaluation mode
Tips: We recommend to use tf.nn.tanh for output_activation, especially for continuous action space, to ensure the final action range is exactly the same as declared in action space after action normalization.
-
__module__
= 'rlzoo.common.policy_networks'¶
-
action_shape
¶
-
action_space
¶
-
state_shape
¶
-
state_space
¶
-