imitation.scripts.ingredients.policy_evaluation#
This ingredient performs evaluation of learned policy.
It takes care of the right wrappers, does some rollouts and computes statistics of the rollouts.
Functions
|
Evaluation of imitation learned policy. |
- imitation.scripts.ingredients.policy_evaluation.eval_policy(rl_algo, venv, n_episodes_eval, _rnd)[source]#
Evaluation of imitation learned policy.
Has the side effect of setting rl_algo’s environment to venv if it is a BaseAlgorithm.
- Parameters
rl_algo (
Union
[BaseAlgorithm
,BasePolicy
]) – Algorithm to evaluate.venv (
VecEnv
) – Environment to evaluate on.n_episodes_eval (
int
) – The number of episodes to average over when calculating the average episode reward of the imitation policy for return._rnd (
Generator
) – Random number generator provided by Sacred.
- Return type
Mapping
[str
,float
]- Returns
A dictionary with two keys. “imit_stats” gives the return value of rollout_stats() on rollouts test-reward-wrapped environment, using the final policy (remember that the ground-truth reward can be recovered from the “monitor_return” key). “expert_stats” gives the return value of rollout_stats() on the expert demonstrations loaded from path.