policy_iteration

Module policy_iteration. Implementation of Policy iteration algorithm. In policy iteration at each step we do one policy evaluation and one policy improvement.

class policy_iteration.PolicyIteration(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)

Policy iteration class

__init__(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)

Constructor.

Parameters
  • algorithm (algo_config Configuration for the) –

  • adapted (policy_adaptor How the policy should be) –

actions_after_training_ends(env: Env, **options) None

Any actions the algorithm should perform after the training ends

Parameters
  • on (env The environment the agent is trained) –

  • code (options Any options passed by the client) –

Return type

None

actions_before_training_begins(env: Env, **options) None

Execute any actions the algorithm needs before starting the iterations

Parameters
  • env (The environment to train on) –

  • options (Any options passed by the application) –

Return type

None

on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

Parameters
  • env (The environment to run the training episode) –

  • episode_idx (The episode index) –

  • options (Options that client code may pass) –

Return type

An instance of EpisodeInfo

property policy: Policy

Get the trained policy

Return type

An instance of Policy