dp_algorithm_base

The module dp_algorithm_base. Specifies the base class for dynamic programming algorithms.

class dp_algorithm_base.DPAlgoConfig(n_episodes: int = 0, tolerance: float = 1e-08, render_env: bool = False, render_env_freq: int = - 1, gamma: float = 0.1, policy: Optional[Policy] = None): Data class to wrap configuration parameters for Dynamic programming algorithms

class dp_algorithm_base.DPAlgoBase(algo_config: DPAlgoConfig)

Base class for DP-based algorithms

__init__(algo_config: DPAlgoConfig) → None

Constructor. Initialize the algorithm by passing the configuration instance needed.

actions_after_episode_ends(env: Env, episode_idx: int, **options) → None

Execute any actions the algorithm needs after ending the episode

Parameters

Return type

None

actions_after_training_ends(env: Env, **options) → None

Execute any actions the algorithm needs after the iterations are finished

Parameters

Return type

None

actions_before_episode_begins(env: Env, episode_idx: int, **options) → None

Execute any actions the algorithm needs before starting the episode

Parameters

Return type

None

actions_before_training_begins(env: Env, **options) → None

Execute any actions the algorithm needs before starting the training episodes

Parameters

Return type

None

property gamma: float

Returns the gamma i.e. the discount constant

on_training_episode(env: Env, episode_idx: int, **options) → EpisodeInfo

Train the algorithm on the episode

Parameters

Return type

An instance of EpisodeInfo