Trajectories#

For imitation learning we need trajectories. Trajectories are sequences of observations and actions and sometimes rewards, which are generated by an agent interacting with an environment. They are also called rollouts or episodes. Some are generated by experts and serve as demonstrations, others are generated by the agent and serve as training data for a discriminator. In this library they are stored in a Trajectory dataclass:

@dataclasses.dataclass(frozen=True)
class Trajectory:
    obs: np.ndarray
    """Observations, shape (trajectory_len + 1, ) + observation_shape."""

    acts: np.ndarray
    """Actions, shape (trajectory_len, ) + action_shape."""

    infos: Optional[np.ndarray]
        """An array of info dicts, shape (trajectory_len, )."""

    terminal: bool
    """Does this trajectory (fragment) end in a terminal state?"""

The info dictionaries are optional and can contain arbitrary information. Look at the Trajectory class as well as the gymnasium documentation for more details. TrajectoryWithRew is a subclass of Trajectory and has another rews field, which is an array of rewards of shape (trajectory_len, ).

Usually, they are passed around as sequences of trajectories.

Some algorithms do not need as much information about the ordering of states, actions and rewards. Rather than using trajectories, these algorithms can make use of individual Transitions (flattened trajectories).

Generating Trajectories#

To generate trajectories from a given policy, run the following command:

import numpy as np
import imitation.data.rollout as rollout

your_trajectories = rollout.rollout(
    your_policy,
    your_env,
    sample_until=rollout.make_sample_until(min_episodes=10),
    rng=np.random.default_rng(),
    unwrap=False,
)

Storing/Loading Trajectories#

Trajectories can be stored on disk or uploaded to the HuggingFace Dataset Hub.

This will store the sequence of trajectories into a directory at your_path as a HuggingFace Dataset:

from imitation.data import serialize
serialize.save(your_path, your_trajectories)

In the same way you can load trajectories from a HuggingFace Dataset:

from imitation.data import serialize
your_trajectories = serialize.load(your_path)

Note that some older, now deprecated, trajectory formats are supported by this loader, but not by the saver.

Sharing Trajectories with the HuggingFace Dataset Hub#

To share your trajectories with the HuggingFace Dataset Hub, you need to create a HuggingFace account and log in with the HuggingFace CLI:

$ huggingface-cli login

Then you can upload your trajectories to the HuggingFace Dataset Hub:

from imitation.data.huggingface_utils import trajectories_to_dataset

trajectories_to_dataset(your_trajectories).push_to_hub("your_hf_name/your_dataset_name")

To use a public dataset from the HuggingFace Dataset Hub, you can use the following code:

import datasets
from imitation.data.huggingface_utils import TrajectoryDatasetSequence

your_dataset = datasets.load_dataset("your_hf_name/your_dataset_name")
your_trajectories = TrajectoryDatasetSequence(your_dataset["train"])

The TrajectoryDatasetSequence wraps a HuggingFace dataset so it can be used in the same way as a list of trajectories.

For example, you can analyze the dataset with imitation.data.rollout.rollout_stats() to get the mean return:

from imitation.data.rollout import rollout_stats

stats = rollout_stats(your_trajectories)
print(stats["return_mean"])