sarsa
- class sarsa.Sarsa(algo_config: TDAlgoConfig)
SARSA algorithm
- __init__(algo_config: TDAlgoConfig)
Constructor
- Parameters
algo_config (Algorithm configuration) –
- actions_after_episode_ends(env: Env, episode_idx, **options) None
Execute any actions the algorithm needs after the training episode ends
- Parameters
env (The training environment) –
episode_idx (Training episode index) –
options (Any options passed by the client) –
- Return type
None
- actions_before_training_begins(env: Env, **options) None
Any actions before the training begins
- Parameters
env (The training environment) –
options (Any options passed by the client) –
- Return type
None
- do_on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the agent on the environment at the given episode.
- Parameters
env (The environment to train on) –
episode_idx (The episode index) –
options (Any options passes by the client code) –
- Return type
An instance of the EpisodeInfo class
- update_q_table(reward: float, current_action: int, next_state: int, next_action: int) None
Update the underlying q table
- Parameters
current_action (The action index selected by the policy) –
reward (The reward returned by the environment) –
next_state (The next state observed after taking the action) –
next_action (The next action to take) –
- Return type
None