episodic_sarsa_semi_gradient
Module episodic_sarsa_semi_gradient. Implements semi-gradient SARSA for episodic environments
- class episodic_sarsa_semi_gradient.SemiGradSARSAConfig(n_episodes: int = 0, tolerance: float = 1e-08, render_env: bool = False, render_env_freq: int = - 1, gamma: float = 1.0, alpha: float = 0.1, policy: Policy = None, n_itrs_per_episode: int = 100, dt_update_frequency: int = 100, dt_update_factor: float = 1.0)
- class episodic_sarsa_semi_gradient.EpisodicSarsaSemiGrad(algo_config: SemiGradSARSAConfig)
Episodic semi-gradient SARSA algorithm implementation
- __init__(algo_config: SemiGradSARSAConfig) None
Constructor. Initialize the agent with the given configuration
- Parameters
algo_config (Configuration for the algorithm) –
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None
Execute any actions the algorithm needs after ending the episode
- Parameters
env (The environment to train on) –
episode_idx (The episode index the algorithm trains on) –
options (Any options passed by the client code) –
- Return type
None
- actions_after_training_ends(env: Env, **options) None
Execute any actions the algorithm needs after the iterations are finished
- Parameters
env (The environment to train on) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_episode_begins(env: Env, episode_idx: int, **options) None
Execute any actions the algorithm needs before starting the episode
- Parameters
env (The environment to train on) –
episode_idx (The episode index the algorithm trains on) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training_begins(env: Env, **options) None
Execute any actions the algorithm needs before
- Parameters
env (The environment to train on) –
options (Any options passed by the client code) –
- Return type
None
- on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the algorithm on the episode
- Parameters
env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –
- Return type
An instance of EpisodeInfo
- update_weights(total_reward: float, state_action: Action, state_action_: Action, t: float) None
Update the weights :param total_reward: :type total_reward: The reward observed :param state_action: :type state_action: The action that led to the reward :param state_action_: :param t: :type t: The decay factor for alpha
- Return type
None