Replay Buffer¶

Replay Buffer in RLzoo¶

Functions for utilization.

# Requirements tensorflow==2.0.0a0 tensorlayer==2.0.1

class rlzoo.common.buffer.HindsightReplayBuffer(capacity, hindsight_freq, goal_type, reward_func, done_func)[source]¶

Bases: rlzoo.common.buffer.ReplayBuffer

Hindsight Experience Replay In this buffer, state is a tuple consists of (observation, goal)

GOAL_EPISODE = 'episode'¶

GOAL_FUTURE = 'future'¶

GOAL_RANDOM = 'random'¶

__init__(capacity, hindsight_freq, goal_type, reward_func, done_func)[source]¶

Parameters:	(int) (hindsight_freq) – How many hindsight transitions will be generated for each real transition (str) (goal_type) – The generatation method of hindsight goals. Should be HER_GOAL_* (callable) (done_func) – goal (np.array) X next_state (np.array) -> reward (float) (callable) – goal (np.array) X next_state (np.array) -> done_flag (bool)

__module__ = 'rlzoo.common.buffer'¶

push(*args, **kwargs)[source]¶

push_episode(states, actions, rewards, next_states, dones)[source]¶

class rlzoo.common.buffer.MinSegmentTree(capacity)[source]¶

Bases: rlzoo.common.buffer.SegmentTree

__init__(capacity)[source]¶

Build a Segment Tree data structure.

https://en.wikipedia.org/wiki/Segment_tree

Can be used as regular array, but with two important differences:

setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).

user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.

Parameters:	apacity – (int) Total size of the array - must be a power of two. operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative) neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.

__module__ = 'rlzoo.common.buffer'¶

min(start=0, end=None)[source]¶: Returns min(arr[start], …, arr[end])

class rlzoo.common.buffer.PrioritizedReplayBuffer(capacity, alpha, beta)[source]¶

Bases: rlzoo.common.buffer.ReplayBuffer

__init__(capacity, alpha, beta)[source]¶

Create Prioritized Replay buffer.

Parameters:	capacity – (int) Max number of transitions to store in the buffer. When the buffer overflows the old memories are dropped. alpha – (float) how much prioritization is used (0 - no prioritization, 1 - full prioritization)

See Also:: ReplayBuffer.__init__

__module__ = 'rlzoo.common.buffer'¶

push(*args)[source]¶: See ReplayBuffer.store_effect

sample(batch_size)[source]¶: Sample a batch of experiences

update_priorities(idxes, priorities)[source]¶: Update priorities of sampled transitions

class rlzoo.common.buffer.ReplayBuffer(capacity)[source]¶

Bases: object

A standard ring buffer for storing transitions and sampling for training

__dict__ = mappingproxy({'__module__': 'rlzoo.common.buffer', '__doc__': 'A standard ring buffer for storing transitions and sampling for training', '__init__': <function ReplayBuffer.__init__>, 'push': <function ReplayBuffer.push>, 'sample': <function ReplayBuffer.sample>, '_encode_sample': <function ReplayBuffer._encode_sample>, '__len__': <function ReplayBuffer.__len__>, '__dict__': <attribute '__dict__' of 'ReplayBuffer' objects>, '__weakref__': <attribute '__weakref__' of 'ReplayBuffer' objects>})¶

__init__(capacity)[source]¶: Initialize self. See help(type(self)) for accurate signature.

__len__()[source]¶

__module__ = 'rlzoo.common.buffer'¶

__weakref__¶: list of weak references to the object (if defined)

push(state, action, reward, next_state, done)[source]¶

sample(batch_size)[source]¶

class rlzoo.common.buffer.SegmentTree(capacity, operation, neutral_element)[source]¶

Bases: object

__dict__ = mappingproxy({'__module__': 'rlzoo.common.buffer', '__init__': <function SegmentTree.__init__>, '_reduce_helper': <function SegmentTree._reduce_helper>, 'reduce': <function SegmentTree.reduce>, '__setitem__': <function SegmentTree.__setitem__>, '__getitem__': <function SegmentTree.__getitem__>, '__dict__': <attribute '__dict__' of 'SegmentTree' objects>, '__weakref__': <attribute '__weakref__' of 'SegmentTree' objects>, '__doc__': None})¶

__getitem__(idx)[source]¶

__init__(capacity, operation, neutral_element)[source]¶

Build a Segment Tree data structure.

https://en.wikipedia.org/wiki/Segment_tree

Can be used as regular array, but with two important differences:

setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).

user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.

Parameters:	apacity – (int) Total size of the array - must be a power of two. operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative) neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.

__module__ = 'rlzoo.common.buffer'¶

__setitem__(idx, val)[source]¶

__weakref__¶: list of weak references to the object (if defined)

reduce(start=0, end=None)[source]¶

Returns result of applying self.operation to a contiguous subsequence of the array.

Parameters:	start – (int) beginning of the subsequence end – (int) end of the subsequences

Returns:: reduced: (obj) result of reducing self.operation over the specified range of array.

class rlzoo.common.buffer.SumSegmentTree(capacity)[source]¶

Bases: rlzoo.common.buffer.SegmentTree

__init__(capacity)[source]¶

Build a Segment Tree data structure.

https://en.wikipedia.org/wiki/Segment_tree

Can be used as regular array, but with two important differences:

setting item’s value is slightly slower. It is O(lg capacity) instead of O(1).

user has access to an efficient ( O(log segment size) ) reduce operation which reduces operation over a contiguous subsequence of items in the array.

Parameters:	apacity – (int) Total size of the array - must be a power of two. operation – (lambda obj, obj -> obj) and operation for combining elements (eg. sum, max) must form a mathematical group together with the set of possible values for array elements (i.e. be associative) neutral_element – (obj) neutral element for the operation above. eg. float(‘-inf’) for max and 0 for sum.

__module__ = 'rlzoo.common.buffer'¶

find_prefixsum_idx(prefixsum)[source]¶

Find the highest index i in the array such that: sum(arr[0] + arr[1] + … + arr[i - i]) <= prefixsum

if array values are probabilities, this function allows to sample indexes according to the discrete probability efficiently.

Parameters:	perfixsum – (float) upperbound on the sum of array prefix

Returns:

idx: (int): highest index satisfying the prefixsum constraint

sum(start=0, end=None)[source]¶: Returns arr[start] + … + arr[end]