episodic_sarsa_semi_gradient

Module episodic_sarsa_semi_gradient. Implements semi-gradient SARSA for episodic environments

class episodic_sarsa_semi_gradient.SemiGradSARSAConfig(n_episodes: int = 0, tolerance: float = 1e-08, render_env: bool = False, render_env_freq: int = - 1, gamma: float = 1.0, alpha: float = 0.1, policy: Policy = None, n_itrs_per_episode: int = 100, dt_update_frequency: int = 100, dt_update_factor: float = 1.0)
class episodic_sarsa_semi_gradient.EpisodicSarsaSemiGrad(algo_config: SemiGradSARSAConfig)

Episodic semi-gradient SARSA algorithm implementation

__init__(algo_config: SemiGradSARSAConfig) None

Constructor. Initialize the agent with the given configuration

Parameters

algo_config (Configuration for the algorithm) –

actions_after_episode_ends(env: Env, episode_idx: int, **options) None

Execute any actions the algorithm needs after ending the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The episode index the algorithm trains on) –

  • options (Any options passed by the client code) –

Return type

None

actions_after_training_ends(env: Env, **options) None

Execute any actions the algorithm needs after the iterations are finished

Parameters
  • env (The environment to train on) –

  • options (Any options passed by the client code) –

Return type

None

actions_before_episode_begins(env: Env, episode_idx: int, **options) None

Execute any actions the algorithm needs before starting the episode

Parameters
  • env (The environment to train on) –

  • episode_idx (The episode index the algorithm trains on) –

  • options (Any options passed by the client code) –

Return type

None

actions_before_training_begins(env: Env, **options) None

Execute any actions the algorithm needs before

Parameters
  • env (The environment to train on) –

  • options (Any options passed by the client code) –

Return type

None

on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to run the training episode) –

  • episode_idx (The episode index) –

  • options (Options that client code may pass) –

Return type

An instance of EpisodeInfo

update_weights(total_reward: float, state_action: Action, state_action_: Action, t: float) None

Update the weights :param total_reward: :type total_reward: The reward observed :param state_action: :type state_action: The action that led to the reward :param state_action_: :param t: :type t: The decay factor for alpha

Return type

None