policy_improvement

Module policy_improvement. Implements the policy improvement algorithm as this is described in the book

http://incompleteideas.net/book/RLbook2020.pdf

class policy_improvement.PolicyImprovement(algo_config: DPAlgoConfig, v: array, policy_adaptor: PolicyAdaptor)

Implementation of policy improvement

__init__(algo_config: DPAlgoConfig, v: array, policy_adaptor: PolicyAdaptor) None

Constructor. Initialize an algorithm instance using the configuration instance the value-function and the object that adapts the policy

Parameters
  • algo_config (Algorithm configuration) –

  • v (The value function to use) –

  • policy_adaptor (The object responsible to adapt the policy) –

on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to run the training episode) –

  • episode_idx (The episode index) –

  • options (Options that client code may pass) –

Return type

An instance of EpisodeInfo