value_iteration

The value_iteration module. Provides a simple implementation of value iteration algorithm

class value_iteration.ValueIteration(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)

The class ValueIteration implements the value iteration algorithm

__init__(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase) None

Constructor

Parameters
  • configuration (algo_config Algorithm) –

  • adapted (policy_adaptor How the underlying policy is) –

actions_before_training_begins(env: Env, **options) None

Execute any actions the algorithm needs before starting the iterations

Parameters
  • env (The environment to train on) –

  • options (Any options passed by the client code) –

Return type

None

on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to run the training episode) –

  • episode_idx (The episode index) –

  • options (Options that client code may pass) –

Return type

An instance of EpisodeInfo