policy_iteration

Module policy_iteration. Implementation of Policy iteration algorithm. In policy iteration at each step we do one policy evaluation and one policy improvement.

class policy_iteration.PolicyIteration(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)

Policy iteration class

__init__(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)

Constructor.

Parameters

algorithm (algo_config Configuration for the) –
adapted (policy_adaptor How the policy should be) –

actions_after_training_ends(env: Env, **options) → None

Any actions the algorithm should perform after the training ends

Parameters

on (env The environment the agent is trained) –
code (options Any options passed by the client) –

Return type

None

actions_before_training_begins(env: Env, **options) → None

Execute any actions the algorithm needs before starting the iterations

Parameters

env (The environment to train on) –
options (Any options passed by the application) –

Return type

None

on_training_episode(env: Env, episode_idx: int, **options) → EpisodeInfo

Train the algorithm on the episode

Parameters

env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –

Return type

An instance of EpisodeInfo

property policy: Policy

Get the trained policy

Return type: An instance of Policy