imitation.scripts.eval_policy#

Evaluate policies: render policy interactively, save videos, log episode return.

Functions

eval_policy(eval_n_timesteps, ...[, ...])

Rolls a policy out in an environment, collecting statistics.

main_console()

video_wrapper_factory(log_dir, **kwargs)

Returns a function that wraps the environment in a video recorder.

Classes

InteractiveRender(venv, fps)

Render the wrapped environment(s) on screen.

class imitation.scripts.eval_policy.InteractiveRender(venv, fps)[source]#

Bases: VecEnvWrapper

Render the wrapped environment(s) on screen.

__init__(venv, fps)[source]#

Builds renderer for venv running at fps frames per second.

reset()[source]#

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns

observation

step_wait()[source]#

Wait for the step taken with step_async().

Returns

observation, reward, done, information

imitation.scripts.eval_policy.eval_policy(eval_n_timesteps, eval_n_episodes, render, render_fps, videos, video_kwargs, _run, _rnd, reward_type=None, reward_path=None, rollout_save_path=None, explore_kwargs=None)[source]#

Rolls a policy out in an environment, collecting statistics.

Parameters
  • eval_n_timesteps (Optional[int]) – Minimum number of timesteps to evaluate for. Set exactly one of eval_n_episodes and eval_n_timesteps.

  • eval_n_episodes (Optional[int]) – Minimum number of episodes to evaluate for. Set exactly one of eval_n_episodes and eval_n_timesteps.

  • render (bool) – If True, renders interactively to the screen.

  • render_fps (int) – The target number of frames per second to render on screen.

  • videos (bool) – If True, saves videos to log_dir.

  • video_kwargs (Mapping[str, Any]) – Keyword arguments passed through to video_wrapper.VideoWrapper.

  • _rnd (Generator) – Random number generator provided by Sacred.

  • reward_type (Optional[str]) – If specified, overrides the environment reward with a reward of this.

  • reward_path (Optional[str]) – If reward_type is specified, the path to a serialized reward of reward_type to override the environment reward with.

  • rollout_save_path (Optional[str]) – where to save rollouts used for computing stats to disk; if None, then do not save.

  • explore_kwargs (Optional[Mapping[str, Any]]) – keyword arguments to an exploration wrapper to apply before rolling out, not including policy_callable, venv, and rng; if None, then do not wrap.

Returns

Return value of imitation.util.rollout.rollout_stats().

imitation.scripts.eval_policy.main_console()[source]#
imitation.scripts.eval_policy.video_wrapper_factory(log_dir, **kwargs)[source]#

Returns a function that wraps the environment in a video recorder.