q_learning
The module q_learning. Implements a tabular-based Q-learning algorithm
- class q_learning.QLearning(algo_config: TDAlgoConfig)
Q-learning algorithm
- __init__(algo_config: TDAlgoConfig) None
Constructor. Initialize by passing the configuration options
- Parameters
algo_config (The configuration options) –
- _update_q_table(env: Env, state: int, action: int, reward: float, next_state: Optional[int] = None) None
Update the underlying q table
- Parameters
env (The training environment) –
state (The current state the environment in on) –
action (The action index selected by the policy) –
reward (The reward returned by the environment) –
next_state (The next state observed after taking the action) –
- Return type
None
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None
Execute any actions the algorithm needs after ending the episode
- Parameters
env (The environment to train on) –
episode_idx (The episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training_begins(env: Env, **options) None
Execute any actions the algorithm needs before training starts
- Parameters
env (The environment to train on) –
options (Any options passed by the client code) –
- Return type
None
- on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the algorithm on the episode
- Parameters
env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –
- Return type
An instance of EpisodeInfo