Module iterative_policy_evaluation. Implements a tabular version of the iterative policy evaluation algorithm as described in the book

class iterative_policy_evaluation.IterativePolicyEvaluator(algo_config: DPAlgoConfig)

Implements iterative policy evaluation algorithm

__init__(algo_config: DPAlgoConfig) None

Constructor. Initialize the algorithm by passing the configuration instance needed.


configuration (algo_config Algorithm) –

actions_before_training_begins(env: Env, **options) None

Execute any actions the algorithm needs before starting the iterations

on_training_episode(env: Env, episode_idx: int, **options) EpisodeInfo

Train the algorithm on the episode

  • env (The environment to run the training episode) –

  • episode_idx (The episode index) –

  • options (Options that client code may pass) –

Return type

An instance of EpisodeInfo