policy_improvement

Module policy_improvement. Implements the policy improvement algorithm as this is described in the book

class policy_improvement.PolicyImprovement(algo_config: DPAlgoConfig, v: array, policy_adaptor: PolicyAdaptor)

Implementation of policy improvement

__init__(algo_config: DPAlgoConfig, v: array, policy_adaptor: PolicyAdaptor) → None

Constructor. Initialize an algorithm instance using the configuration instance the value-function and the object that adapts the policy

Parameters

on_training_episode(env: Env, episode_idx: int, **options) → EpisodeInfo

Train the algorithm on the episode

Parameters

Return type

An instance of EpisodeInfo