download this notebook here

Train an Agent using Generative Adversarial Imitation Learning#

The idea of generative adversarial imitation learning is to train a discriminator network to distinguish between expert trajectories and learner trajectories. The learner is trained using a traditional reinforcement learning algorithm such as PPO and is rewarded for trajectories that make the discriminator think that it was an expert trajectory.

As usual, we first need an expert. Again, we download one from the HuggingFace model hub for convenience.

Note that we use a variant of the CartPole environment from the seals package, which has fixed episode durations. Read more about why we do this here.

import numpy as np
from imitation.policies.serialize import load_policy
from imitation.util.util import make_vec_env
from imitation.data.wrappers import RolloutInfoWrapper

SEED = 42

env = make_vec_env(
    "seals:seals/CartPole-v0",
    rng=np.random.default_rng(SEED),
    n_envs=8,
    post_wrappers=[
        lambda env, _: RolloutInfoWrapper(env)
    ],  # needed for computing rollouts later
)
expert = load_policy(
    "ppo-huggingface",
    organization="HumanCompatibleAI",
    env_name="seals/CartPole-v0",
    venv=env,
)

We generate some expert trajectories, that the discriminator needs to distinguish from the learner’s trajectories.

from imitation.data import rollout

rollouts = rollout.rollout(
    expert,
    env,
    rollout.make_sample_until(min_timesteps=None, min_episodes=60),
    rng=np.random.default_rng(SEED),
)

Now we are ready to set up our GAIL trainer. Note, that the reward_net is actually the network of the discriminator. We evaluate the learner before and after training so we can see if it made any progress.

First we construct a GAIL trainer …

from imitation.algorithms.adversarial.gail import GAIL
from imitation.rewards.reward_nets import BasicRewardNet
from imitation.util.networks import RunningNorm
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.evaluation import evaluate_policy

learner = PPO(
    env=env,
    policy=MlpPolicy,
    batch_size=64,
    ent_coef=0.0,
    learning_rate=0.0004,
    gamma=0.95,
    n_epochs=5,
    seed=SEED,
)
reward_net = BasicRewardNet(
    observation_space=env.observation_space,
    action_space=env.action_space,
    normalize_input_layer=RunningNorm,
)
gail_trainer = GAIL(
    demonstrations=rollouts,
    demo_batch_size=1024,
    gen_replay_buffer_capacity=512,
    n_disc_updates_per_round=8,
    venv=env,
    gen_algo=learner,
    reward_net=reward_net,
)

… then we evaluate it before training …

env.seed(SEED)
learner_rewards_before_training, _ = evaluate_policy(
    learner, env, 100, return_episode_rewards=True
)

… and train it …

gail_trainer.train(200_000)
------------------------------------------
| raw/                        |          |
|    gen/rollout/ep_len_mean  | 500      |
|    gen/rollout/ep_rew_mean  | 29.8     |
|    gen/time/fps             | 4208     |
|    gen/time/iterations      | 1        |
|    gen/time/time_elapsed    | 3        |
|    gen/time/total_timesteps | 16384    |
------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.696    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.694    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.693    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.69     |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.688    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.686    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.684    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.683    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.689    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 1        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 29.8     |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.64e+04 |
|    gen/train/approx_kl              | 0.00905  |
|    gen/train/clip_fraction          | 0.0295   |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.686   |
|    gen/train/explained_variance     | 0.0301   |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.127    |
|    gen/train/n_updates              | 5        |
|    gen/train/policy_gradient_loss   | -0.0015  |
|    gen/train/value_loss             | 4.43     |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 31.9        |
|    gen/rollout/ep_rew_wrapped_mean | 268         |
|    gen/time/fps                    | 4212        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 32768       |
|    gen/train/approx_kl             | 0.009048736 |
|    gen/train/clip_fraction         | 0.0295      |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.686      |
|    gen/train/explained_variance    | 0.0301      |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.127       |
|    gen/train/n_updates             | 5           |
|    gen/train/policy_gradient_loss  | -0.0015     |
|    gen/train/value_loss            | 4.43        |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.685    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.684    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.683    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.682    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.68     |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.68     |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.679    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.678    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.691    |
|    disc/disc_loss                   | 0.681    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 2        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 31.9     |
|    gen/rollout/ep_rew_wrapped_mean  | 268      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 3.28e+04 |
|    gen/train/approx_kl              | 0.0102   |
|    gen/train/clip_fraction          | 0.133    |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.686   |
|    gen/train/explained_variance     | 0.841    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.0145   |
|    gen/train/n_updates              | 10       |
|    gen/train/policy_gradient_loss   | -0.00786 |
|    gen/train/value_loss             | 0.248    |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 34.1        |
|    gen/rollout/ep_rew_wrapped_mean | 275         |
|    gen/time/fps                    | 4213        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 49152       |
|    gen/train/approx_kl             | 0.010180451 |
|    gen/train/clip_fraction         | 0.133       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.686      |
|    gen/train/explained_variance    | 0.841       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.0145      |
|    gen/train/n_updates             | 10          |
|    gen/train/policy_gradient_loss  | -0.00786    |
|    gen/train/value_loss            | 0.248       |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.672    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.671    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.67     |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.668    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.667    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.668    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.689    |
|    disc/disc_loss                   | 0.664    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.689    |
|    disc/disc_loss                   | 0.661    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.69     |
|    disc/disc_loss                   | 0.668    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 3        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 34.1     |
|    gen/rollout/ep_rew_wrapped_mean  | 275      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 4.92e+04 |
|    gen/train/approx_kl              | 0.0153   |
|    gen/train/clip_fraction          | 0.195    |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.673   |
|    gen/train/explained_variance     | 0.815    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | -0.0246  |
|    gen/train/n_updates              | 15       |
|    gen/train/policy_gradient_loss   | -0.0135  |
|    gen/train/value_loss             | 0.0463   |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 37.8        |
|    gen/rollout/ep_rew_wrapped_mean | 277         |
|    gen/time/fps                    | 4207        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 65536       |
|    gen/train/approx_kl             | 0.015265099 |
|    gen/train/clip_fraction         | 0.195       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.673      |
|    gen/train/explained_variance    | 0.815       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | -0.0246     |
|    gen/train/n_updates             | 15          |
|    gen/train/policy_gradient_loss  | -0.0135     |
|    gen/train/value_loss            | 0.0463      |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.652    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.646    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.646    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.685    |
|    disc/disc_loss                   | 0.64     |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.685    |
|    disc/disc_loss                   | 0.638    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.684    |
|    disc/disc_loss                   | 0.634    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.682    |
|    disc/disc_loss                   | 0.628    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.682    |
|    disc/disc_loss                   | 0.625    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.685    |
|    disc/disc_loss                   | 0.639    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 4        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 37.8     |
|    gen/rollout/ep_rew_wrapped_mean  | 277      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 6.55e+04 |
|    gen/train/approx_kl              | 0.0161   |
|    gen/train/clip_fraction          | 0.215    |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.654   |
|    gen/train/explained_variance     | 0.892    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | -0.0168  |
|    gen/train/n_updates              | 20       |
|    gen/train/policy_gradient_loss   | -0.0195  |
|    gen/train/value_loss             | 0.0173   |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 40.4        |
|    gen/rollout/ep_rew_wrapped_mean | 284         |
|    gen/time/fps                    | 4206        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 81920       |
|    gen/train/approx_kl             | 0.016116062 |
|    gen/train/clip_fraction         | 0.215       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.654      |
|    gen/train/explained_variance    | 0.892       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | -0.0168     |
|    gen/train/n_updates             | 20          |
|    gen/train/policy_gradient_loss  | -0.0195     |
|    gen/train/value_loss            | 0.0173      |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.689    |
|    disc/disc_loss                   | 0.659    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.689    |
|    disc/disc_loss                   | 0.659    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.688    |
|    disc/disc_loss                   | 0.655    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0        |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.651    |
|    disc/disc_proportion_expert_pred | 0        |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.5      |
|    disc/disc_acc_expert             | 0.000977 |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.688    |
|    disc/disc_loss                   | 0.652    |
|    disc/disc_proportion_expert_pred | 0.000488 |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.573    |
|    disc/disc_acc_expert             | 0.146    |
|    disc/disc_acc_gen                | 1        |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.647    |
|    disc/disc_proportion_expert_pred | 0.0728   |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.684    |
|    disc/disc_acc_expert             | 0.374    |
|    disc/disc_acc_gen                | 0.993    |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.647    |
|    disc/disc_proportion_expert_pred | 0.19     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.708    |
|    disc/disc_acc_expert             | 0.434    |
|    disc/disc_acc_gen                | 0.983    |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.642    |
|    disc/disc_proportion_expert_pred | 0.225    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.558    |
|    disc/disc_acc_expert             | 0.119    |
|    disc/disc_acc_gen                | 0.997    |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.652    |
|    disc/disc_proportion_expert_pred | 0.0611   |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 5        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 40.4     |
|    gen/rollout/ep_rew_wrapped_mean  | 284      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 8.19e+04 |
|    gen/train/approx_kl              | 0.0112   |
|    gen/train/clip_fraction          | 0.129    |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.634   |
|    gen/train/explained_variance     | 0.871    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.00102  |
|    gen/train/n_updates              | 25       |
|    gen/train/policy_gradient_loss   | -0.00957 |
|    gen/train/value_loss             | 0.0103   |
--------------------------------------------------
---------------------------------------------------
| raw/                               |            |
|    gen/rollout/ep_len_mean         | 500        |
|    gen/rollout/ep_rew_mean         | 40.8       |
|    gen/rollout/ep_rew_wrapped_mean | 288        |
|    gen/time/fps                    | 4212       |
|    gen/time/iterations             | 1          |
|    gen/time/time_elapsed           | 3          |
|    gen/time/total_timesteps        | 98304      |
|    gen/train/approx_kl             | 0.01118237 |
|    gen/train/clip_fraction         | 0.129      |
|    gen/train/clip_range            | 0.2        |
|    gen/train/entropy_loss          | -0.634     |
|    gen/train/explained_variance    | 0.871      |
|    gen/train/learning_rate         | 0.0004     |
|    gen/train/loss                  | 0.00102    |
|    gen/train/n_updates             | 25         |
|    gen/train/policy_gradient_loss  | -0.00957   |
|    gen/train/value_loss            | 0.0103     |
---------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.732    |
|    disc/disc_acc_expert             | 0.468    |
|    disc/disc_acc_gen                | 0.997    |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.639    |
|    disc/disc_proportion_expert_pred | 0.235    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.718    |
|    disc/disc_acc_expert             | 0.442    |
|    disc/disc_acc_gen                | 0.994    |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.637    |
|    disc/disc_proportion_expert_pred | 0.224    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.736    |
|    disc/disc_acc_expert             | 0.476    |
|    disc/disc_acc_gen                | 0.996    |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.638    |
|    disc/disc_proportion_expert_pred | 0.24     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.734    |
|    disc/disc_acc_expert             | 0.472    |
|    disc/disc_acc_gen                | 0.996    |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.635    |
|    disc/disc_proportion_expert_pred | 0.238    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.714    |
|    disc/disc_acc_expert             | 0.44     |
|    disc/disc_acc_gen                | 0.987    |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.633    |
|    disc/disc_proportion_expert_pred | 0.227    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.746    |
|    disc/disc_acc_expert             | 0.504    |
|    disc/disc_acc_gen                | 0.988    |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.632    |
|    disc/disc_proportion_expert_pred | 0.258    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.819    |
|    disc/disc_acc_expert             | 0.657    |
|    disc/disc_acc_gen                | 0.981    |
|    disc/disc_entropy                | 0.686    |
|    disc/disc_loss                   | 0.631    |
|    disc/disc_proportion_expert_pred | 0.338    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.856    |
|    disc/disc_acc_expert             | 0.733    |
|    disc/disc_acc_gen                | 0.979    |
|    disc/disc_entropy                | 0.685    |
|    disc/disc_loss                   | 0.627    |
|    disc/disc_proportion_expert_pred | 0.377    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.757    |
|    disc/disc_acc_expert             | 0.524    |
|    disc/disc_acc_gen                | 0.99     |
|    disc/disc_entropy                | 0.687    |
|    disc/disc_loss                   | 0.634    |
|    disc/disc_proportion_expert_pred | 0.267    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 6        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 40.8     |
|    gen/rollout/ep_rew_wrapped_mean  | 288      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 9.83e+04 |
|    gen/train/approx_kl              | 0.00629  |
|    gen/train/clip_fraction          | 0.0466   |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.635   |
|    gen/train/explained_variance     | 0.873    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.0116   |
|    gen/train/n_updates              | 30       |
|    gen/train/policy_gradient_loss   | -0.00363 |
|    gen/train/value_loss             | 0.0126   |
--------------------------------------------------
-----------------------------------------------------
| raw/                               |              |
|    gen/rollout/ep_len_mean         | 500          |
|    gen/rollout/ep_rew_mean         | 39.6         |
|    gen/rollout/ep_rew_wrapped_mean | 287          |
|    gen/time/fps                    | 4207         |
|    gen/time/iterations             | 1            |
|    gen/time/time_elapsed           | 3            |
|    gen/time/total_timesteps        | 114688       |
|    gen/train/approx_kl             | 0.0062911767 |
|    gen/train/clip_fraction         | 0.0466       |
|    gen/train/clip_range            | 0.2          |
|    gen/train/entropy_loss          | -0.635       |
|    gen/train/explained_variance    | 0.873        |
|    gen/train/learning_rate         | 0.0004       |
|    gen/train/loss                  | 0.0116       |
|    gen/train/n_updates             | 30           |
|    gen/train/policy_gradient_loss  | -0.00363     |
|    gen/train/value_loss            | 0.0126       |
-----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.852    |
|    disc/disc_acc_expert             | 0.735    |
|    disc/disc_acc_gen                | 0.969    |
|    disc/disc_entropy                | 0.683    |
|    disc/disc_loss                   | 0.62     |
|    disc/disc_proportion_expert_pred | 0.383    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.89     |
|    disc/disc_acc_expert             | 0.812    |
|    disc/disc_acc_gen                | 0.968    |
|    disc/disc_entropy                | 0.682    |
|    disc/disc_loss                   | 0.616    |
|    disc/disc_proportion_expert_pred | 0.422    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.919    |
|    disc/disc_acc_expert             | 0.875    |
|    disc/disc_acc_gen                | 0.964    |
|    disc/disc_entropy                | 0.681    |
|    disc/disc_loss                   | 0.614    |
|    disc/disc_proportion_expert_pred | 0.456    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.95     |
|    disc/disc_acc_expert             | 0.933    |
|    disc/disc_acc_gen                | 0.967    |
|    disc/disc_entropy                | 0.681    |
|    disc/disc_loss                   | 0.61     |
|    disc/disc_proportion_expert_pred | 0.483    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.963    |
|    disc/disc_acc_expert             | 0.955    |
|    disc/disc_acc_gen                | 0.971    |
|    disc/disc_entropy                | 0.681    |
|    disc/disc_loss                   | 0.608    |
|    disc/disc_proportion_expert_pred | 0.492    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.962    |
|    disc/disc_acc_expert             | 0.962    |
|    disc/disc_acc_gen                | 0.963    |
|    disc/disc_entropy                | 0.679    |
|    disc/disc_loss                   | 0.603    |
|    disc/disc_proportion_expert_pred | 0.5      |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.967    |
|    disc/disc_acc_expert             | 0.97     |
|    disc/disc_acc_gen                | 0.964    |
|    disc/disc_entropy                | 0.679    |
|    disc/disc_loss                   | 0.602    |
|    disc/disc_proportion_expert_pred | 0.503    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.958    |
|    disc/disc_acc_expert             | 0.977    |
|    disc/disc_acc_gen                | 0.94     |
|    disc/disc_entropy                | 0.678    |
|    disc/disc_loss                   | 0.597    |
|    disc/disc_proportion_expert_pred | 0.518    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.933    |
|    disc/disc_acc_expert             | 0.902    |
|    disc/disc_acc_gen                | 0.963    |
|    disc/disc_entropy                | 0.681    |
|    disc/disc_loss                   | 0.609    |
|    disc/disc_proportion_expert_pred | 0.47     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 7        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 39.6     |
|    gen/rollout/ep_rew_wrapped_mean  | 287      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.15e+05 |
|    gen/train/approx_kl              | 0.0087   |
|    gen/train/clip_fraction          | 0.0778   |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.629   |
|    gen/train/explained_variance     | 0.928    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.0141   |
|    gen/train/n_updates              | 35       |
|    gen/train/policy_gradient_loss   | -0.00673 |
|    gen/train/value_loss             | 0.0171   |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 39          |
|    gen/rollout/ep_rew_wrapped_mean | 282         |
|    gen/time/fps                    | 4209        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 131072      |
|    gen/train/approx_kl             | 0.008696594 |
|    gen/train/clip_fraction         | 0.0778      |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.629      |
|    gen/train/explained_variance    | 0.928       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.0141      |
|    gen/train/n_updates             | 35          |
|    gen/train/policy_gradient_loss  | -0.00673    |
|    gen/train/value_loss            | 0.0171      |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.964    |
|    disc/disc_acc_expert             | 0.981    |
|    disc/disc_acc_gen                | 0.946    |
|    disc/disc_entropy                | 0.671    |
|    disc/disc_loss                   | 0.572    |
|    disc/disc_proportion_expert_pred | 0.518    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.977    |
|    disc/disc_acc_expert             | 0.992    |
|    disc/disc_acc_gen                | 0.962    |
|    disc/disc_entropy                | 0.669    |
|    disc/disc_loss                   | 0.563    |
|    disc/disc_proportion_expert_pred | 0.515    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.969    |
|    disc/disc_acc_expert             | 0.994    |
|    disc/disc_acc_gen                | 0.943    |
|    disc/disc_entropy                | 0.669    |
|    disc/disc_loss                   | 0.562    |
|    disc/disc_proportion_expert_pred | 0.525    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.97     |
|    disc/disc_acc_expert             | 0.998    |
|    disc/disc_acc_gen                | 0.941    |
|    disc/disc_entropy                | 0.667    |
|    disc/disc_loss                   | 0.557    |
|    disc/disc_proportion_expert_pred | 0.528    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.97     |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.939    |
|    disc/disc_entropy                | 0.665    |
|    disc/disc_loss                   | 0.553    |
|    disc/disc_proportion_expert_pred | 0.53     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.972    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.943    |
|    disc/disc_entropy                | 0.666    |
|    disc/disc_loss                   | 0.552    |
|    disc/disc_proportion_expert_pred | 0.528    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.969    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.938    |
|    disc/disc_entropy                | 0.663    |
|    disc/disc_loss                   | 0.543    |
|    disc/disc_proportion_expert_pred | 0.531    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.976    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.952    |
|    disc/disc_entropy                | 0.661    |
|    disc/disc_loss                   | 0.539    |
|    disc/disc_proportion_expert_pred | 0.524    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.971    |
|    disc/disc_acc_expert             | 0.996    |
|    disc/disc_acc_gen                | 0.946    |
|    disc/disc_entropy                | 0.667    |
|    disc/disc_loss                   | 0.555    |
|    disc/disc_proportion_expert_pred | 0.525    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 8        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 39       |
|    gen/rollout/ep_rew_wrapped_mean  | 282      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.31e+05 |
|    gen/train/approx_kl              | 0.00855  |
|    gen/train/clip_fraction          | 0.0715   |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.624   |
|    gen/train/explained_variance     | 0.922    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.00161  |
|    gen/train/n_updates              | 40       |
|    gen/train/policy_gradient_loss   | -0.00499 |
|    gen/train/value_loss             | 0.0237   |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 39.6        |
|    gen/rollout/ep_rew_wrapped_mean | 271         |
|    gen/time/fps                    | 4214        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 147456      |
|    gen/train/approx_kl             | 0.008551636 |
|    gen/train/clip_fraction         | 0.0715      |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.624      |
|    gen/train/explained_variance    | 0.922       |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.00161     |
|    gen/train/n_updates             | 40          |
|    gen/train/policy_gradient_loss  | -0.00499    |
|    gen/train/value_loss            | 0.0237      |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.96     |
|    disc/disc_acc_expert             | 0.998    |
|    disc/disc_acc_gen                | 0.922    |
|    disc/disc_entropy                | 0.674    |
|    disc/disc_loss                   | 0.571    |
|    disc/disc_proportion_expert_pred | 0.538    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.956    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.912    |
|    disc/disc_entropy                | 0.672    |
|    disc/disc_loss                   | 0.567    |
|    disc/disc_proportion_expert_pred | 0.544    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.962    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.924    |
|    disc/disc_entropy                | 0.67     |
|    disc/disc_loss                   | 0.56     |
|    disc/disc_proportion_expert_pred | 0.538    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.966    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.932    |
|    disc/disc_entropy                | 0.669    |
|    disc/disc_loss                   | 0.558    |
|    disc/disc_proportion_expert_pred | 0.534    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.954    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.907    |
|    disc/disc_entropy                | 0.667    |
|    disc/disc_loss                   | 0.553    |
|    disc/disc_proportion_expert_pred | 0.546    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.951    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.902    |
|    disc/disc_entropy                | 0.667    |
|    disc/disc_loss                   | 0.551    |
|    disc/disc_proportion_expert_pred | 0.549    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.96     |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.921    |
|    disc/disc_entropy                | 0.662    |
|    disc/disc_loss                   | 0.539    |
|    disc/disc_proportion_expert_pred | 0.54     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.958    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.915    |
|    disc/disc_entropy                | 0.661    |
|    disc/disc_loss                   | 0.535    |
|    disc/disc_proportion_expert_pred | 0.542    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.958    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.917    |
|    disc/disc_entropy                | 0.668    |
|    disc/disc_loss                   | 0.554    |
|    disc/disc_proportion_expert_pred | 0.541    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 9        |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 39.6     |
|    gen/rollout/ep_rew_wrapped_mean  | 271      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.47e+05 |
|    gen/train/approx_kl              | 0.00591  |
|    gen/train/clip_fraction          | 0.0515   |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.613   |
|    gen/train/explained_variance     | 0.935    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | -0.00763 |
|    gen/train/n_updates              | 45       |
|    gen/train/policy_gradient_loss   | -0.00313 |
|    gen/train/value_loss             | 0.0288   |
--------------------------------------------------
-----------------------------------------------------
| raw/                               |              |
|    gen/rollout/ep_len_mean         | 500          |
|    gen/rollout/ep_rew_mean         | 44.7         |
|    gen/rollout/ep_rew_wrapped_mean | 259          |
|    gen/time/fps                    | 4213         |
|    gen/time/iterations             | 1            |
|    gen/time/time_elapsed           | 3            |
|    gen/time/total_timesteps        | 163840       |
|    gen/train/approx_kl             | 0.0059148837 |
|    gen/train/clip_fraction         | 0.0515       |
|    gen/train/clip_range            | 0.2          |
|    gen/train/entropy_loss          | -0.613       |
|    gen/train/explained_variance    | 0.935        |
|    gen/train/learning_rate         | 0.0004       |
|    gen/train/loss                  | -0.00763     |
|    gen/train/n_updates             | 45           |
|    gen/train/policy_gradient_loss  | -0.00313     |
|    gen/train/value_loss            | 0.0288       |
-----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.951    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.901    |
|    disc/disc_entropy                | 0.643    |
|    disc/disc_loss                   | 0.495    |
|    disc/disc_proportion_expert_pred | 0.549    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.948    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.896    |
|    disc/disc_entropy                | 0.638    |
|    disc/disc_loss                   | 0.485    |
|    disc/disc_proportion_expert_pred | 0.552    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.955    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.909    |
|    disc/disc_entropy                | 0.636    |
|    disc/disc_loss                   | 0.481    |
|    disc/disc_proportion_expert_pred | 0.545    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.95     |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.899    |
|    disc/disc_entropy                | 0.633    |
|    disc/disc_loss                   | 0.477    |
|    disc/disc_proportion_expert_pred | 0.55     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.945    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.89     |
|    disc/disc_entropy                | 0.633    |
|    disc/disc_loss                   | 0.475    |
|    disc/disc_proportion_expert_pred | 0.555    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.942    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.885    |
|    disc/disc_entropy                | 0.628    |
|    disc/disc_loss                   | 0.468    |
|    disc/disc_proportion_expert_pred | 0.558    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.948    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.896    |
|    disc/disc_entropy                | 0.624    |
|    disc/disc_loss                   | 0.461    |
|    disc/disc_proportion_expert_pred | 0.552    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.95     |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.899    |
|    disc/disc_entropy                | 0.615    |
|    disc/disc_loss                   | 0.446    |
|    disc/disc_proportion_expert_pred | 0.55     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.948    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.897    |
|    disc/disc_entropy                | 0.631    |
|    disc/disc_loss                   | 0.474    |
|    disc/disc_proportion_expert_pred | 0.552    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 10       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 44.7     |
|    gen/rollout/ep_rew_wrapped_mean  | 259      |
|    gen/time/fps                     | 4.21e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.64e+05 |
|    gen/train/approx_kl              | 0.00881  |
|    gen/train/clip_fraction          | 0.0822   |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.596   |
|    gen/train/explained_variance     | 0.942    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | -0.0335  |
|    gen/train/n_updates              | 50       |
|    gen/train/policy_gradient_loss   | -0.00478 |
|    gen/train/value_loss             | 0.0465   |
--------------------------------------------------
-----------------------------------------------------
| raw/                               |              |
|    gen/rollout/ep_len_mean         | 500          |
|    gen/rollout/ep_rew_mean         | 50.9         |
|    gen/rollout/ep_rew_wrapped_mean | 243          |
|    gen/time/fps                    | 4194         |
|    gen/time/iterations             | 1            |
|    gen/time/time_elapsed           | 3            |
|    gen/time/total_timesteps        | 180224       |
|    gen/train/approx_kl             | 0.0088136345 |
|    gen/train/clip_fraction         | 0.0822       |
|    gen/train/clip_range            | 0.2          |
|    gen/train/entropy_loss          | -0.596       |
|    gen/train/explained_variance    | 0.942        |
|    gen/train/learning_rate         | 0.0004       |
|    gen/train/loss                  | -0.0335      |
|    gen/train/n_updates             | 50           |
|    gen/train/policy_gradient_loss  | -0.00478     |
|    gen/train/value_loss            | 0.0465       |
-----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.788    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.575    |
|    disc/disc_entropy                | 0.642    |
|    disc/disc_loss                   | 0.543    |
|    disc/disc_proportion_expert_pred | 0.712    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.791    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.581    |
|    disc/disc_entropy                | 0.637    |
|    disc/disc_loss                   | 0.536    |
|    disc/disc_proportion_expert_pred | 0.709    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.794    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.588    |
|    disc/disc_entropy                | 0.632    |
|    disc/disc_loss                   | 0.526    |
|    disc/disc_proportion_expert_pred | 0.706    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.781    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.562    |
|    disc/disc_entropy                | 0.632    |
|    disc/disc_loss                   | 0.533    |
|    disc/disc_proportion_expert_pred | 0.719    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.788    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.575    |
|    disc/disc_entropy                | 0.628    |
|    disc/disc_loss                   | 0.524    |
|    disc/disc_proportion_expert_pred | 0.712    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.783    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.566    |
|    disc/disc_entropy                | 0.624    |
|    disc/disc_loss                   | 0.524    |
|    disc/disc_proportion_expert_pred | 0.717    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.787    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.573    |
|    disc/disc_entropy                | 0.622    |
|    disc/disc_loss                   | 0.519    |
|    disc/disc_proportion_expert_pred | 0.713    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.775    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.551    |
|    disc/disc_entropy                | 0.621    |
|    disc/disc_loss                   | 0.524    |
|    disc/disc_proportion_expert_pred | 0.725    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.786    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.572    |
|    disc/disc_entropy                | 0.63     |
|    disc/disc_loss                   | 0.529    |
|    disc/disc_proportion_expert_pred | 0.714    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 11       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 50.9     |
|    gen/rollout/ep_rew_wrapped_mean  | 243      |
|    gen/time/fps                     | 4.19e+03 |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.8e+05  |
|    gen/train/approx_kl              | 0.00988  |
|    gen/train/clip_fraction          | 0.117    |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.597   |
|    gen/train/explained_variance     | 0.95     |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.0114   |
|    gen/train/n_updates              | 55       |
|    gen/train/policy_gradient_loss   | -0.00606 |
|    gen/train/value_loss             | 0.0522   |
--------------------------------------------------
----------------------------------------------------
| raw/                               |             |
|    gen/rollout/ep_len_mean         | 500         |
|    gen/rollout/ep_rew_mean         | 56.4        |
|    gen/rollout/ep_rew_wrapped_mean | 229         |
|    gen/time/fps                    | 4199        |
|    gen/time/iterations             | 1           |
|    gen/time/time_elapsed           | 3           |
|    gen/time/total_timesteps        | 196608      |
|    gen/train/approx_kl             | 0.009878516 |
|    gen/train/clip_fraction         | 0.117       |
|    gen/train/clip_range            | 0.2         |
|    gen/train/entropy_loss          | -0.597      |
|    gen/train/explained_variance    | 0.95        |
|    gen/train/learning_rate         | 0.0004      |
|    gen/train/loss                  | 0.0114      |
|    gen/train/n_updates             | 55          |
|    gen/train/policy_gradient_loss  | -0.00606    |
|    gen/train/value_loss            | 0.0522      |
----------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.583    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.165    |
|    disc/disc_entropy                | 0.671    |
|    disc/disc_loss                   | 0.659    |
|    disc/disc_proportion_expert_pred | 0.917    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.588    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.177    |
|    disc/disc_entropy                | 0.672    |
|    disc/disc_loss                   | 0.653    |
|    disc/disc_proportion_expert_pred | 0.912    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.588    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.176    |
|    disc/disc_entropy                | 0.671    |
|    disc/disc_loss                   | 0.653    |
|    disc/disc_proportion_expert_pred | 0.912    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.582    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.163    |
|    disc/disc_entropy                | 0.672    |
|    disc/disc_loss                   | 0.653    |
|    disc/disc_proportion_expert_pred | 0.918    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.596    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.192    |
|    disc/disc_entropy                | 0.673    |
|    disc/disc_loss                   | 0.65     |
|    disc/disc_proportion_expert_pred | 0.904    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.596    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.192    |
|    disc/disc_entropy                | 0.674    |
|    disc/disc_loss                   | 0.646    |
|    disc/disc_proportion_expert_pred | 0.904    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.61     |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.221    |
|    disc/disc_entropy                | 0.676    |
|    disc/disc_loss                   | 0.645    |
|    disc/disc_proportion_expert_pred | 0.89     |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/                                |          |
|    disc/disc_acc                    | 0.604    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.208    |
|    disc/disc_entropy                | 0.675    |
|    disc/disc_loss                   | 0.643    |
|    disc/disc_proportion_expert_pred | 0.896    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/                               |          |
|    disc/disc_acc                    | 0.593    |
|    disc/disc_acc_expert             | 1        |
|    disc/disc_acc_gen                | 0.187    |
|    disc/disc_entropy                | 0.673    |
|    disc/disc_loss                   | 0.65     |
|    disc/disc_proportion_expert_pred | 0.907    |
|    disc/disc_proportion_expert_true | 0.5      |
|    disc/global_step                 | 12       |
|    disc/n_expert                    | 1.02e+03 |
|    disc/n_generated                 | 1.02e+03 |
|    gen/rollout/ep_len_mean          | 500      |
|    gen/rollout/ep_rew_mean          | 56.4     |
|    gen/rollout/ep_rew_wrapped_mean  | 229      |
|    gen/time/fps                     | 4.2e+03  |
|    gen/time/iterations              | 1        |
|    gen/time/time_elapsed            | 3        |
|    gen/time/total_timesteps         | 1.97e+05 |
|    gen/train/approx_kl              | 0.0124   |
|    gen/train/clip_fraction          | 0.148    |
|    gen/train/clip_range             | 0.2      |
|    gen/train/entropy_loss           | -0.586   |
|    gen/train/explained_variance     | 0.968    |
|    gen/train/learning_rate          | 0.0004   |
|    gen/train/loss                   | 0.000133 |
|    gen/train/n_updates              | 60       |
|    gen/train/policy_gradient_loss   | -0.00875 |
|    gen/train/value_loss             | 0.0555   |
--------------------------------------------------

… and finally evaluate it again.

env.seed(SEED)
learner_rewards_after_training, _ = evaluate_policy(
    learner, env, 100, return_episode_rewards=True
)

We can see that an untrained policy performs poorly, while GAIL matches expert returns (500):

print(
    "Rewards before training:",
    np.mean(learner_rewards_before_training),
    "+/-",
    np.std(learner_rewards_before_training),
)
print(
    "Rewards after training:",
    np.mean(learner_rewards_after_training),
    "+/-",
    np.std(learner_rewards_after_training),
)
Rewards before training: 102.6 +/- 24.11514047232568
Rewards after training: 49.76 +/- 16.98535840069323