sarsa

class sarsa.Sarsa(algo_config: TDAlgoConfig)

SARSA algorithm

__init__(algo_config: TDAlgoConfig)

Constructor

Parameters

algo_config (Algorithm configuration) –

actions_after_episode_ends(env: Env, episode_idx, **options) None

Execute any actions the algorithm needs after the training episode ends

Parameters
  • env (The training environment) –

  • episode_idx (Training episode index) –

  • options (Any options passed by the client) –

Return type

None

actions_before_training_begins(env: Env, **options) None

Any actions before the training begins

Parameters
  • env (The training environment) –

  • options (Any options passed by the client) –

Return type

None

do_on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the agent on the environment at the given episode.

Parameters
  • env (The environment to train on) –

  • episode_idx (The episode index) –

  • options (Any options passes by the client code) –

Return type

An instance of the EpisodeInfo class

update_q_table(reward: float, current_action: int, next_state: int, next_action: int) None

Update the underlying q table

Parameters
  • current_action (The action index selected by the policy) –

  • reward (The reward returned by the environment) –

  • next_state (The next state observed after taking the action) –

  • next_action (The next action to take) –

Return type

None