value_iteration
The value_iteration module. Provides a simple implementation of value iteration algorithm
- class value_iteration.ValueIteration(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)
The class ValueIteration implements the value iteration algorithm
- __init__(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase) None
Constructor
- Parameters
configuration (algo_config Algorithm) –
adapted (policy_adaptor How the underlying policy is) –
- actions_before_training_begins(env: Env, **options) None
Execute any actions the algorithm needs before starting the iterations
- Parameters
env (The environment to train on) –
options (Any options passed by the client code) –
- Return type
None
- on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the algorithm on the episode
- Parameters
env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –
- Return type
An instance of EpisodeInfo