double_q_learning

Module double_q_learning. Implements tabular double Q-learning algorithm as presented in the paper

https://www.researchgate.net/publication/221619239_Double_Q-learning

class double_q_learning.DoubleQLearning(algo_config: TDAlgoConfig)

The class DoubleQLearning implements double q-learning tabular algorithm

__init__(algo_config: TDAlgoConfig) None

Constructor. Initialize the algorithm with the given configuration

Parameters

algo_config (The algorithm configuration) –

_update_q_table(env: Env, state: int, action: int, reward: float, next_state: Optional[int] = None) None

Update the Q-value function for the given state when taking the given action. The implementation chooses which of the two tables to update using a coin flip

Parameters
  • env (The environment to train on) –

  • state (The state currently on) –

  • action (The action taken at the state) –

  • reward (The reward taken) –

  • next_state (The state to go when taking the given action) –

Return type

None

actions_after_episode_ends(env: Env, episode_idx: int, **options)

Execute any actions the algorithm needs after ending the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The episode index) –

  • options (Any options passed by the client code) –

Return type

None

actions_before_training_begins(env: Env, **options) None

Execute any actions the algorithm needs before training begins

Parameters
  • env (The environment to train on) –

  • options (Any options passed by the client code) –

Return type

None

do_on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the agent on the environment at the given episode.

Parameters
  • env (The environment to train on) –

  • episode_idx (The episode index) –

  • options (Any options passes by the client code) –

Return type

An instance of the EpisodeInfo class