Trajectories#

For imitation learning we need trajectories. Trajectories are sequences of observations and actions and sometimes rewards, which are generated by an agent interacting with an environment. They are also called rollouts or episodes. Some are generated by experts and serve as demonstrations, others are generated by the agent and serve as training data for a discriminator. In this library they are stored in a Trajectory dataclass:

@dataclasses.dataclass(frozen=True)
class Trajectory:
    obs: np.ndarray
    """Observations, shape (trajectory_len + 1, ) + observation_shape."""

    acts: np.ndarray
    """Actions, shape (trajectory_len, ) + action_shape."""

    infos: Optional[np.ndarray]
        """An array of info dicts, shape (trajectory_len, )."""

    terminal: bool
    """Does this trajectory (fragment) end in a terminal state?"""

The info dictionaries are optional and can contain arbitrary information. Look at the Trajectory class as well as the gymnasium documentation for more details. TrajectoryWithRew is a subclass of Trajectory and has another rews field, which is an array of rewards of shape (trajectory_len, ).

Usually, they are passed around as sequences of trajectories.

Some algorithms do not need as much information about the ordering of states, actions and rewards. Rather than using trajectories, these algorithms can make use of individual Transitions (flattened trajectories).

Generating Trajectories#

To generate trajectories from a given policy, run the following command:

import numpy as np
import imitation.data.rollout as rollout

your_trajectories = rollout.rollout(
    your_policy,
    your_env,
    sample_until=rollout.make_sample_until(min_episodes=10),
    rng=np.random.default_rng(),
    unwrap=False,
)

Storing/Loading Trajectories#

Trajectories can be stored on disk or uploaded to the HuggingFace Dataset Hub.

This will store the sequence of trajectories into a directory at your_path as a HuggingFace Dataset:

from imitation.data import serialize
serialize.save(your_path, your_trajectories)

In the same way you can load trajectories from a HuggingFace Dataset:

from imitation.data import serialize
your_trajectories = serialize.load(your_path)

Note that some older, now deprecated, trajectory formats are supported by this loader, but not by the saver.

Trajectories#

Generating Trajectories#

Storing/Loading Trajectories#

Sharing Trajectories with the HuggingFace Dataset Hub#