double_q_learning
Module double_q_learning. Implements tabular double Q-learning algorithm as presented in the paper
https://www.researchgate.net/publication/221619239_Double_Q-learning
- class double_q_learning.DoubleQLearning(algo_config: TDAlgoConfig)
The class DoubleQLearning implements double q-learning tabular algorithm
- __init__(algo_config: TDAlgoConfig) None
Constructor. Initialize the algorithm with the given configuration
- Parameters
algo_config (The algorithm configuration) –
- _update_q_table(env: Env, state: int, action: int, reward: float, next_state: Optional[int] = None) None
Update the Q-value function for the given state when taking the given action. The implementation chooses which of the two tables to update using a coin flip
- Parameters
env (The environment to train on) –
state (The state currently on) –
action (The action taken at the state) –
reward (The reward taken) –
next_state (The state to go when taking the given action) –
- Return type
None
- actions_after_episode_ends(env: Env, episode_idx: int, **options)
Execute any actions the algorithm needs after ending the episode
- Parameters
env (The environment to train on) –
episode_idx (The episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training_begins(env: Env, **options) None
Execute any actions the algorithm needs before training begins
- Parameters
env (The environment to train on) –
options (Any options passed by the client code) –
- Return type
None
- do_on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the agent on the environment at the given episode.
- Parameters
env (The environment to train on) –
episode_idx (The episode index) –
options (Any options passes by the client code) –
- Return type
An instance of the EpisodeInfo class