double_q_learning

Module double_q_learning. Implements tabular double Q-learning algorithm as presented in the paper

https://www.researchgate.net/publication/221619239_Double_Q-learning

class double_q_learning.DoubleQLearning(algo_config: TDAlgoConfig)

The class DoubleQLearning implements double q-learning tabular algorithm

__init__(algo_config: TDAlgoConfig) → None

Constructor. Initialize the algorithm with the given configuration

Parameters: algo_config (The algorithm configuration) –

_update_q_table(env: Env, state: int, action: int, reward: float, next_state: Optional[int] = None) → None

Update the Q-value function for the given state when taking the given action. The implementation chooses which of the two tables to update using a coin flip

Parameters

env (The environment to train on) –
state (The state currently on) –
action (The action taken at the state) –
reward (The reward taken) –
next_state (The state to go when taking the given action) –

Return type

None

actions_after_episode_ends(env: Env, episode_idx: int, **options)

Execute any actions the algorithm needs after ending the episode

Parameters

env (The environment to train on) –
episode_idx (The episode index) –
options (Any options passed by the client code) –

Return type

None

actions_before_training_begins(env: Env, **options) → None

Execute any actions the algorithm needs before training begins

Parameters

env (The environment to train on) –
options (Any options passed by the client code) –

Return type

None

do_on_training_episode(env: Env, episode_idx: int, **options) → EpisodeInfo

Train the agent on the environment at the given episode.

Parameters

env (The environment to train on) –
episode_idx (The episode index) –
options (Any options passes by the client code) –

Return type

An instance of the EpisodeInfo class