Trajectories#
For imitation learning we need trajectories.
Trajectories are sequences of observations and actions and sometimes rewards, which are generated by an agent
interacting with an environment.
They are also called rollouts or episodes.
Some are generated by experts and serve as demonstrations,
others are generated by the agent and serve as training data for a discriminator.
In this library they are stored in a Trajectory
dataclass:
@dataclasses.dataclass(frozen=True)
class Trajectory:
obs: np.ndarray
"""Observations, shape (trajectory_len + 1, ) + observation_shape."""
acts: np.ndarray
"""Actions, shape (trajectory_len, ) + action_shape."""
infos: Optional[np.ndarray]
"""An array of info dicts, shape (trajectory_len, )."""
terminal: bool
"""Does this trajectory (fragment) end in a terminal state?"""
The info dictionaries are optional and can contain arbitrary information.
Look at the Trajectory
class as well as the
gymnasium documentation for more details.
TrajectoryWithRew
is a subclass of
Trajectory
and has another
rews
field,
which is an array of rewards of shape (trajectory_len, ).
Usually, they are passed around as sequences of trajectories.
Some algorithms do not need as much information about the ordering of states, actions and rewards. Rather than using trajectories, these algorithms can make use of individual
Transitions
(flattened
trajectories).
Generating Trajectories#
To generate trajectories from a given policy, run the following command:
import numpy as np
import imitation.data.rollout as rollout
your_trajectories = rollout.rollout(
your_policy,
your_env,
sample_until=rollout.make_sample_until(min_episodes=10),
rng=np.random.default_rng(),
unwrap=False,
)
Storing/Loading Trajectories#
Trajectories can be stored on disk or uploaded to the HuggingFace Dataset Hub.
This will store the sequence of trajectories into a directory at your_path as a HuggingFace Dataset:
from imitation.data import serialize
serialize.save(your_path, your_trajectories)
In the same way you can load trajectories from a HuggingFace Dataset:
from imitation.data import serialize
your_trajectories = serialize.load(your_path)
Note that some older, now deprecated, trajectory formats are supported by this loader
,
but not by the saver
.