Replay Buffer

Replay Buffer in RLzoo

Functions for utilization.

# Requirements tensorflow==2.0.0a0 tensorlayer==2.0.1

class rlzoo.common.buffer.HindsightReplayBuffer(capacity, hindsight_freq, goal_type, reward_func, done_func)[source]

Bases: rlzoo.common.buffer.ReplayBuffer

Hindsight Experience Replay In this buffer, state is a tuple consists of (observation, goal)

GOAL_EPISODE = 'episode'
GOAL_FUTURE = 'future'
GOAL_RANDOM = 'random'
__init__(capacity, hindsight_freq, goal_type, reward_func, done_func)[source]
Parameters:
  • (int) (hindsight_freq) – How many hindsight transitions will be generated for each real transition
  • (str) (goal_type) – The generatation method of hindsight goals. Should be HER_GOAL_*
  • (callable) (done_func) – goal (np.array) X next_state (np.array) -> reward (float)
  • (callable) – goal (np.array) X next_state (np.array) -> done_flag (bool)
__module__ = 'rlzoo.common.buffer'
push(*args, **kwargs)[source]
push_episode(states, actions, rewards, next_states, dones)[source]
class rlzoo.common.buffer.MinSegmentTree(capacity)[source]

Bases: rlzoo.common.buffer.SegmentTree

__init__(capacity)[source]

Build a Segment Tree data structure.

https://en.wikipedia.org/wiki/Segment_tree

Can be used as regular array, but with two important differences:

  1. setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).
  2. user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.
Parameters:
  • apacity – (int) Total size of the array - must be a power of two.
  • operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative)
  • neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.
__module__ = 'rlzoo.common.buffer'
min(start=0, end=None)[source]

Returns min(arr[start], …, arr[end])

class rlzoo.common.buffer.PrioritizedReplayBuffer(capacity, alpha, beta)[source]

Bases: rlzoo.common.buffer.ReplayBuffer

__init__(capacity, alpha, beta)[source]

Create Prioritized Replay buffer.

Parameters:
  • capacity – (int) Max number of transitions to store in the buffer. When the buffer overflows the old memories are dropped.
  • alpha – (float) how much prioritization is used (0 - no prioritization, 1 - full prioritization)
See Also:
ReplayBuffer.__init__
__module__ = 'rlzoo.common.buffer'
push(*args)[source]

See ReplayBuffer.store_effect

sample(batch_size)[source]

Sample a batch of experiences

update_priorities(idxes, priorities)[source]

Update priorities of sampled transitions

class rlzoo.common.buffer.ReplayBuffer(capacity)[source]

Bases: object

A standard ring buffer for storing transitions and sampling for training

__dict__ = mappingproxy({'__module__': 'rlzoo.common.buffer', '__doc__': 'A standard ring buffer for storing transitions and sampling for training', '__init__': <function ReplayBuffer.__init__>, 'push': <function ReplayBuffer.push>, 'sample': <function ReplayBuffer.sample>, '_encode_sample': <function ReplayBuffer._encode_sample>, '__len__': <function ReplayBuffer.__len__>, '__dict__': <attribute '__dict__' of 'ReplayBuffer' objects>, '__weakref__': <attribute '__weakref__' of 'ReplayBuffer' objects>})
__init__(capacity)[source]

Initialize self. See help(type(self)) for accurate signature.

__len__()[source]
__module__ = 'rlzoo.common.buffer'
__weakref__

list of weak references to the object (if defined)

push(state, action, reward, next_state, done)[source]
sample(batch_size)[source]
class rlzoo.common.buffer.SegmentTree(capacity, operation, neutral_element)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'rlzoo.common.buffer', '__init__': <function SegmentTree.__init__>, '_reduce_helper': <function SegmentTree._reduce_helper>, 'reduce': <function SegmentTree.reduce>, '__setitem__': <function SegmentTree.__setitem__>, '__getitem__': <function SegmentTree.__getitem__>, '__dict__': <attribute '__dict__' of 'SegmentTree' objects>, '__weakref__': <attribute '__weakref__' of 'SegmentTree' objects>, '__doc__': None})
__getitem__(idx)[source]
__init__(capacity, operation, neutral_element)[source]

Build a Segment Tree data structure.

https://en.wikipedia.org/wiki/Segment_tree

Can be used as regular array, but with two important differences:

  1. setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).
  2. user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.
Parameters:
  • apacity – (int) Total size of the array - must be a power of two.
  • operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative)
  • neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.
__module__ = 'rlzoo.common.buffer'
__setitem__(idx, val)[source]
__weakref__

list of weak references to the object (if defined)

reduce(start=0, end=None)[source]

Returns result of applying self.operation to a contiguous subsequence of the array.

Parameters:
  • start – (int) beginning of the subsequence
  • end – (int) end of the subsequences
Returns:
reduced: (obj) result of reducing self.operation over the specified range of array.
class rlzoo.common.buffer.SumSegmentTree(capacity)[source]

Bases: rlzoo.common.buffer.SegmentTree

__init__(capacity)[source]

Build a Segment Tree data structure.

https://en.wikipedia.org/wiki/Segment_tree

Can be used as regular array, but with two important differences:

  1. setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).
  2. user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.
Parameters:
  • apacity – (int) Total size of the array - must be a power of two.
  • operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative)
  • neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.
__module__ = 'rlzoo.common.buffer'
find_prefixsum_idx(prefixsum)[source]
Find the highest index i in the array such that
sum(arr[0] + arr[1] + … + arr[i - i]) <= prefixsum

if array values are probabilities, this function allows to sample indexes according to the discrete probability efficiently.

Parameters:perfixsum – (float) upperbound on the sum of array prefix
Returns:
idx: (int)
highest index satisfying the prefixsum constraint
sum(start=0, end=None)[source]

Returns arr[start] + … + arr[end]