First Steps#
Imitation can be used in two main ways: through its command-line interface (CLI) or Python API. The CLI allows you to quickly train and test algorithms and policies directly from the command line. The Python API provides greater flexibility and extensibility, and allows you to inter-operate with your existing Python environment.
CLI Quickstart#
We provide several CLI scripts as front-ends to the algorithms implemented in imitation
.
These use Sacred for configuration and replicability.
For information on how to configure Sacred CLI options, see the Sacred docs.
#!/usr/bin/env bash
# Train PPO agent on pendulum and collect expert demonstrations. Tensorboard logs saved in quickstart/rl/
python -m imitation.scripts.train_rl with pendulum environment.fast policy_evaluation.fast rl.fast fast logging.log_dir=quickstart/rl/
# Train GAIL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial gail with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.rollout_path=quickstart/rl/rollouts/final.npz
# Train AIRL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial airl with pendulum environment.fast demonstrations.fast policy_evaluation.fast rl.fast fast demonstrations.rollout_path=quickstart/rl/rollouts/final.npz
Note
Remove the fast
options from the commands above to allow training run to completion.
Tip
python -m imitation.scripts.train_rl print_config
will list Sacred script options.
These configuration options are also documented in each script’s docstrings.
Python Interface Quickstart#
Here’s an example script that loads CartPole demonstrations and trains BC, GAIL, and
AIRL models on that data. You will need to pip install seals
or pip install imitation[test]
to run this.
"""This is a simple example demonstrating how to clone the behavior of an expert.
Refer to the jupyter notebooks for more detailed examples of how to use the algorithms.
"""
import gym
import numpy as np
from stable_baselines3 import PPO
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.ppo import MlpPolicy
from imitation.algorithms import bc
from imitation.data import rollout
from imitation.data.wrappers import RolloutInfoWrapper
env = gym.make("CartPole-v1")
rng = np.random.default_rng(0)
def train_expert():
print("Training a expert.")
expert = PPO(
policy=MlpPolicy,
env=env,
seed=0,
batch_size=64,
ent_coef=0.0,
learning_rate=0.0003,
n_epochs=10,
n_steps=64,
)
expert.learn(100) # Note: change this to 100000 to train a decent expert.
return expert
def sample_expert_transitions():
expert = train_expert()
print("Sampling expert transitions.")
rollouts = rollout.rollout(
expert,
DummyVecEnv([lambda: RolloutInfoWrapper(env)]),
rollout.make_sample_until(min_timesteps=None, min_episodes=50),
rng=rng,
)
return rollout.flatten_trajectories(rollouts)
transitions = sample_expert_transitions()
bc_trainer = bc.BC(
observation_space=env.observation_space,
action_space=env.action_space,
demonstrations=transitions,
rng=rng,
)
reward, _ = evaluate_policy(
bc_trainer.policy, # type: ignore[arg-type]
env,
n_eval_episodes=3,
render=True,
)
print(f"Reward before training: {reward}")
print("Training a policy using Behavior Cloning")
bc_trainer.train(n_epochs=1)
reward, _ = evaluate_policy(
bc_trainer.policy, # type: ignore[arg-type]
env,
n_eval_episodes=3,
render=True,
)
print(f"Reward after training: {reward}")