dp_algorithm_base
The module dp_algorithm_base. Specifies the base class for dynamic programming algorithms.
- class dp_algorithm_base.DPAlgoConfig(n_episodes: int = 0, tolerance: float = 1e-08, render_env: bool = False, render_env_freq: int = - 1, gamma: float = 0.1, policy: Optional[Policy] = None)
Data class to wrap configuration parameters for Dynamic programming algorithms
- class dp_algorithm_base.DPAlgoBase(algo_config: DPAlgoConfig)
Base class for DP-based algorithms
- __init__(algo_config: DPAlgoConfig) None
Constructor. Initialize the algorithm by passing the configuration instance needed.
- Parameters
configuration (algo_config Algorithm) –
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None
Execute any actions the algorithm needs after ending the episode
- Parameters
env (The environment to train on) –
episode_idx (The episode index) –
options (Any named options passed by the client code) –
- Return type
None
- actions_after_training_ends(env: Env, **options) None
Execute any actions the algorithm needs after the iterations are finished
- Parameters
env (The environment to train on) –
options (Any named options passed by the client code) –
- Return type
None
- actions_before_episode_begins(env: Env, episode_idx: int, **options) None
Execute any actions the algorithm needs before starting the episode
- Parameters
env (The environment to train on) –
episode_idx (The episode index) –
options (Any named options passed by the client code) –
- Return type
None
- actions_before_training_begins(env: Env, **options) None
Execute any actions the algorithm needs before starting the training episodes
- Parameters
env (The environment to train on) –
options (Any named options passed by the client code) –
- Return type
None
- property gamma: float
Returns the gamma i.e. the discount constant
- Return type
Returns the gamma i.e. the discount constant
- on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the algorithm on the episode
- Parameters
env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –
- Return type
An instance of EpisodeInfo