policy_improvement
Module policy_improvement. Implements the policy improvement algorithm as this is described in the book
http://incompleteideas.net/book/RLbook2020.pdf
- class policy_improvement.PolicyImprovement(algo_config: DPAlgoConfig, v: array, policy_adaptor: PolicyAdaptor)
Implementation of policy improvement
- __init__(algo_config: DPAlgoConfig, v: array, policy_adaptor: PolicyAdaptor) None
Constructor. Initialize an algorithm instance using the configuration instance the value-function and the object that adapts the policy
- Parameters
algo_config (Algorithm configuration) –
v (The value function to use) –
policy_adaptor (The object responsible to adapt the policy) –
- on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the algorithm on the episode
- Parameters
env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –
- Return type
An instance of EpisodeInfo