imitation.data.wrappers#
Environment wrappers for collecting rollouts.
Classes
|
Saves transitions of underlying VecEnv. |
|
Add the entire episode's rewards and observations to info at episode end. |
- class imitation.data.wrappers.BufferingWrapper(venv, error_on_premature_reset=True)[source]#
Bases:
VecEnvWrapper
Saves transitions of underlying VecEnv.
Retrieve saved transitions using pop_transitions().
- __init__(venv, error_on_premature_reset=True)[source]#
Builds BufferingWrapper.
- Parameters
venv (
VecEnv
) – The wrapped VecEnv.error_on_premature_reset (
bool
) – Error if reset() is called on this wrapper and there are saved samples that haven’t yet been accessed.
- error_on_premature_event: bool#
- n_transitions: Optional[int]#
- pop_finished_trajectories()[source]#
Pops recorded complete trajectories trajs and episode lengths ep_lens.
- Return type
Tuple
[Sequence
[TrajectoryWithRew
],Sequence
[int
]]- Returns
A tuple (trajs, ep_lens) where trajs is a sequence of trajectories including the terminal state (but possibly missing initial states, if pop_trajectories was previously called) and ep_lens is a sequence of episode lengths. Note the episode length will be longer than the trajectory length when the trajectory misses initial states.
- pop_trajectories()[source]#
Pops recorded trajectories trajs and episode lengths ep_lens.
- Return type
Tuple
[Sequence
[TrajectoryWithRew
],Sequence
[int
]]- Returns
A tuple (trajs, ep_lens). trajs is a sequence of trajectory fragments, consisting of data collected after the last call to pop_trajectories. They may miss initial states (if pop_trajectories previously returned a fragment for that episode), and terminal states (if the episode has yet to complete). ep_lens is the total length of completed episodes.
- pop_transitions()[source]#
Pops recorded transitions, returning them as an instance of Transitions.
- Return type
- Returns
All transitions recorded since the last call.
- Raises
RuntimeError – empty (no transitions recorded since last pop).
- reset(**kwargs)[source]#
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns
observation
- class imitation.data.wrappers.RolloutInfoWrapper(env)[source]#
Bases:
Wrapper
Add the entire episode’s rewards and observations to info at episode end.
Whenever done=True, info[“rollouts”] is a dict with keys “obs” and “rews”, whose corresponding values hold the NumPy arrays containing the raw observations and rewards seen during this episode.
- reset(**kwargs)[source]#
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns
the initial observation.
- Return type
observation (object)
- step(action)[source]#
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Parameters
action (object) – an action provided by the agent
- Returns
agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
- Return type
observation (object)