Replay Buffer¶
Replay Buffer in RLzoo¶
Functions for utilization.
# Requirements tensorflow==2.0.0a0 tensorlayer==2.0.1
-
class
rlzoo.common.buffer.
HindsightReplayBuffer
(capacity, hindsight_freq, goal_type, reward_func, done_func)[source]¶ Bases:
rlzoo.common.buffer.ReplayBuffer
Hindsight Experience Replay In this buffer, state is a tuple consists of (observation, goal)
-
GOAL_EPISODE
= 'episode'¶
-
GOAL_FUTURE
= 'future'¶
-
GOAL_RANDOM
= 'random'¶
-
__init__
(capacity, hindsight_freq, goal_type, reward_func, done_func)[source]¶ Parameters: - (int) (hindsight_freq) – How many hindsight transitions will be generated for each real transition
- (str) (goal_type) – The generatation method of hindsight goals. Should be HER_GOAL_*
- (callable) (done_func) – goal (np.array) X next_state (np.array) -> reward (float)
- (callable) – goal (np.array) X next_state (np.array) -> done_flag (bool)
-
__module__
= 'rlzoo.common.buffer'¶
-
-
class
rlzoo.common.buffer.
MinSegmentTree
(capacity)[source]¶ Bases:
rlzoo.common.buffer.SegmentTree
-
__init__
(capacity)[source]¶ Build a Segment Tree data structure.
https://en.wikipedia.org/wiki/Segment_tree
Can be used as regular array, but with two important differences:
- setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).
- user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.
Parameters: - apacity – (int) Total size of the array - must be a power of two.
- operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative)
- neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.
-
__module__
= 'rlzoo.common.buffer'¶
-
-
class
rlzoo.common.buffer.
PrioritizedReplayBuffer
(capacity, alpha, beta)[source]¶ Bases:
rlzoo.common.buffer.ReplayBuffer
-
__init__
(capacity, alpha, beta)[source]¶ Create Prioritized Replay buffer.
Parameters: - capacity – (int) Max number of transitions to store in the buffer. When the buffer overflows the old memories are dropped.
- alpha – (float) how much prioritization is used (0 - no prioritization, 1 - full prioritization)
- See Also:
- ReplayBuffer.__init__
-
__module__
= 'rlzoo.common.buffer'¶
-
-
class
rlzoo.common.buffer.
ReplayBuffer
(capacity)[source]¶ Bases:
object
A standard ring buffer for storing transitions and sampling for training
-
__dict__
= mappingproxy({'__module__': 'rlzoo.common.buffer', '__doc__': 'A standard ring buffer for storing transitions and sampling for training', '__init__': <function ReplayBuffer.__init__>, 'push': <function ReplayBuffer.push>, 'sample': <function ReplayBuffer.sample>, '_encode_sample': <function ReplayBuffer._encode_sample>, '__len__': <function ReplayBuffer.__len__>, '__dict__': <attribute '__dict__' of 'ReplayBuffer' objects>, '__weakref__': <attribute '__weakref__' of 'ReplayBuffer' objects>})¶
-
__module__
= 'rlzoo.common.buffer'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
-
class
rlzoo.common.buffer.
SegmentTree
(capacity, operation, neutral_element)[source]¶ Bases:
object
-
__dict__
= mappingproxy({'__module__': 'rlzoo.common.buffer', '__init__': <function SegmentTree.__init__>, '_reduce_helper': <function SegmentTree._reduce_helper>, 'reduce': <function SegmentTree.reduce>, '__setitem__': <function SegmentTree.__setitem__>, '__getitem__': <function SegmentTree.__getitem__>, '__dict__': <attribute '__dict__' of 'SegmentTree' objects>, '__weakref__': <attribute '__weakref__' of 'SegmentTree' objects>, '__doc__': None})¶
-
__init__
(capacity, operation, neutral_element)[source]¶ Build a Segment Tree data structure.
https://en.wikipedia.org/wiki/Segment_tree
Can be used as regular array, but with two important differences:
- setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).
- user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.
Parameters: - apacity – (int) Total size of the array - must be a power of two.
- operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative)
- neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.
-
__module__
= 'rlzoo.common.buffer'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
reduce
(start=0, end=None)[source]¶ Returns result of applying self.operation to a contiguous subsequence of the array.
Parameters: - start – (int) beginning of the subsequence
- end – (int) end of the subsequences
- Returns:
- reduced: (obj) result of reducing self.operation over the specified range of array.
-
-
class
rlzoo.common.buffer.
SumSegmentTree
(capacity)[source]¶ Bases:
rlzoo.common.buffer.SegmentTree
-
__init__
(capacity)[source]¶ Build a Segment Tree data structure.
https://en.wikipedia.org/wiki/Segment_tree
Can be used as regular array, but with two important differences:
- setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).
- user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.
Parameters: - apacity – (int) Total size of the array - must be a power of two.
- operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative)
- neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.
-
__module__
= 'rlzoo.common.buffer'¶
-
find_prefixsum_idx
(prefixsum)[source]¶ - Find the highest index i in the array such that
- sum(arr[0] + arr[1] + … + arr[i - i]) <= prefixsum
if array values are probabilities, this function allows to sample indexes according to the discrete probability efficiently.
Parameters: perfixsum – (float) upperbound on the sum of array prefix - Returns:
- idx: (int)
- highest index satisfying the prefixsum constraint
-