imitation.scripts.eval_policy#

Evaluate policies: render policy interactively, save videos, log episode return.

Functions

`eval_policy`(eval_n_timesteps, ...[, ...])	Rolls a policy out in an environment, collecting statistics.
`main_console`()
`video_wrapper_factory`(log_dir, **kwargs)	Returns a function that wraps the environment in a video recorder.

Classes

InteractiveRender(venv, fps)

Render the wrapped environment(s) on screen.

class imitation.scripts.eval_policy.InteractiveRender(venv, fps)[source]#

Bases: VecEnvWrapper

Render the wrapped environment(s) on screen.

__init__(venv, fps)[source]#: Builds renderer for venv running at fps frames per second.

reset()[source]#

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns: observation

step_wait()[source]#

Wait for the step taken with step_async().

Returns: observation, reward, done, information

imitation.scripts.eval_policy.eval_policy(eval_n_timesteps, eval_n_episodes, render, render_fps, videos, video_kwargs, _run, _rnd, reward_type=None, reward_path=None, rollout_save_path=None, explore_kwargs=None)[source]#

Rolls a policy out in an environment, collecting statistics.

Parameters

eval_n_timesteps (Optional[int]) – Minimum number of timesteps to evaluate for. Set exactly one of eval_n_episodes and eval_n_timesteps.
eval_n_episodes (Optional[int]) – Minimum number of episodes to evaluate for. Set exactly one of eval_n_episodes and eval_n_timesteps.
render (bool) – If True, renders interactively to the screen.
render_fps (int) – The target number of frames per second to render on screen.
videos (bool) – If True, saves videos to log_dir.
video_kwargs (Mapping[str, Any]) – Keyword arguments passed through to video_wrapper.VideoWrapper.
_rnd (Generator) – Random number generator provided by Sacred.
reward_type (Optional[str]) – If specified, overrides the environment reward with a reward of this.
reward_path (Optional[str]) – If reward_type is specified, the path to a serialized reward of reward_type to override the environment reward with.
rollout_save_path (Optional[str]) – where to save rollouts used for computing stats to disk; if None, then do not save.
explore_kwargs (Optional[Mapping[str, Any]]) – keyword arguments to an exploration wrapper to apply before rolling out, not including policy_callable, venv, and rng; if None, then do not wrap.

Returns

Return value of imitation.util.rollout.rollout_stats().

imitation.scripts.eval_policy.main_console()[source]#

imitation.scripts.eval_policy.video_wrapper_factory(log_dir, **kwargs)[source]#: Returns a function that wraps the environment in a video recorder.