imitation.scripts.ingredients.policy_evaluation#

This ingredient performs evaluation of learned policy.

It takes care of the right wrappers, does some rollouts and computes statistics of the rollouts.

Functions

eval_policy(rl_algo, venv, n_episodes_eval, _rnd)

Evaluation of imitation learned policy.

imitation.scripts.ingredients.policy_evaluation.eval_policy(rl_algo, venv, n_episodes_eval, _rnd)[source]#

Evaluation of imitation learned policy.

Has the side effect of setting rl_algo’s environment to venv if it is a BaseAlgorithm.

Parameters

rl_algo (Union[BaseAlgorithm, BasePolicy]) – Algorithm to evaluate.
venv (VecEnv) – Environment to evaluate on.
n_episodes_eval (int) – The number of episodes to average over when calculating the average episode reward of the imitation policy for return.
_rnd (Generator) – Random number generator provided by Sacred.

Return type

Mapping[str, float]

Returns

A dictionary with two keys. “imit_stats” gives the return value of rollout_stats() on rollouts test-reward-wrapped environment, using the final policy (remember that the ground-truth reward can be recovered from the “monitor_return” key). “expert_stats” gives the return value of rollout_stats() on the expert demonstrations loaded from path.