policy_iteration
Module policy_iteration. Implementation of Policy iteration algorithm. In policy iteration at each step we do one policy evaluation and one policy improvement.
- class policy_iteration.PolicyIteration(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)
Policy iteration class
- __init__(algo_config: DPAlgoConfig, policy_adaptor: PolicyAdaptorBase)
Constructor.
- Parameters
algorithm (algo_config Configuration for the) –
adapted (policy_adaptor How the policy should be) –
- actions_after_training_ends(env: Env, **options) None
Any actions the algorithm should perform after the training ends
- Parameters
on (env The environment the agent is trained) –
code (options Any options passed by the client) –
- Return type
None
- actions_before_training_begins(env: Env, **options) None
Execute any actions the algorithm needs before starting the iterations
- Parameters
env (The environment to train on) –
options (Any options passed by the application) –
- Return type
None
- on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo
Train the algorithm on the episode
- Parameters
env (The environment to run the training episode) –
episode_idx (The episode index) –
options (Options that client code may pass) –
- Return type
An instance of EpisodeInfo
- property policy: Policy
Get the trained policy
- Return type
An instance of Policy