q_learning

The module q_learning. Implements a tabular-based Q-learning algorithm

class q_learning.QLearning(algo_config: TDAlgoConfig)

Q-learning algorithm

__init__(algo_config: TDAlgoConfig) None

Constructor. Initialize by passing the configuration options

Parameters

algo_config (The configuration options) –

_update_q_table(env: Env, state: int, action: int, reward: float, next_state: Optional[int] = None) None

Update the underlying q table

Parameters
  • env (The training environment) –

  • state (The current state the environment in on) –

  • action (The action index selected by the policy) –

  • reward (The reward returned by the environment) –

  • next_state (The next state observed after taking the action) –

Return type

None

actions_after_episode_ends(env: Env, episode_idx: int, **options) None

Execute any actions the algorithm needs after ending the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The episode index) –

  • options (Any options passed by the client code) –

Return type

None

actions_before_training_begins(env: Env, **options) None

Execute any actions the algorithm needs before training starts

Parameters
  • env (The environment to train on) –

  • options (Any options passed by the client code) –

Return type

None

on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to run the training episode) –

  • episode_idx (The episode index) –

  • options (Options that client code may pass) –

Return type

An instance of EpisodeInfo