Train an Agent using Generative Adversarial Imitation Learning#
The idea of generative adversarial imitation learning is to train a discriminator network to distinguish between expert trajectories and learner trajectories. The learner is trained using a traditional reinforcement learning algorithm such as PPO and is rewarded for trajectories that make the discriminator think that it was an expert trajectory.
As usual, we first need an expert. Again, we download one from the HuggingFace model hub for convenience.
Note that we use a variant of the CartPole environment from the seals package, which has fixed episode durations. Read more about why we do this here.
import numpy as np
from imitation.policies.serialize import load_policy
from imitation.util.util import make_vec_env
from imitation.data.wrappers import RolloutInfoWrapper
SEED = 42
env = make_vec_env(
"seals:seals/CartPole-v0",
rng=np.random.default_rng(SEED),
n_envs=8,
post_wrappers=[
lambda env, _: RolloutInfoWrapper(env)
], # needed for computing rollouts later
)
expert = load_policy(
"ppo-huggingface",
organization="HumanCompatibleAI",
env_name="seals/CartPole-v0",
venv=env,
)
We generate some expert trajectories, that the discriminator needs to distinguish from the learner’s trajectories.
from imitation.data import rollout
rollouts = rollout.rollout(
expert,
env,
rollout.make_sample_until(min_timesteps=None, min_episodes=60),
rng=np.random.default_rng(SEED),
)
Now we are ready to set up our GAIL trainer.
Note, that the reward_net
is actually the network of the discriminator.
We evaluate the learner before and after training so we can see if it made any progress.
First we construct a GAIL trainer …
from imitation.algorithms.adversarial.gail import GAIL
from imitation.rewards.reward_nets import BasicRewardNet
from imitation.util.networks import RunningNorm
from stable_baselines3 import PPO
from stable_baselines3.ppo import MlpPolicy
from stable_baselines3.common.evaluation import evaluate_policy
learner = PPO(
env=env,
policy=MlpPolicy,
batch_size=64,
ent_coef=0.0,
learning_rate=0.0004,
gamma=0.95,
n_epochs=5,
seed=SEED,
)
reward_net = BasicRewardNet(
observation_space=env.observation_space,
action_space=env.action_space,
normalize_input_layer=RunningNorm,
)
gail_trainer = GAIL(
demonstrations=rollouts,
demo_batch_size=1024,
gen_replay_buffer_capacity=512,
n_disc_updates_per_round=8,
venv=env,
gen_algo=learner,
reward_net=reward_net,
)
… then we evaluate it before training …
env.seed(SEED)
learner_rewards_before_training, _ = evaluate_policy(
learner, env, 100, return_episode_rewards=True
)
… and train it …
gail_trainer.train(200_000)
------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 29.8 |
| gen/time/fps | 4208 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 16384 |
------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.696 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.694 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.693 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.69 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.688 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.686 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.684 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.683 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.689 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 1 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 29.8 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.64e+04 |
| gen/train/approx_kl | 0.00905 |
| gen/train/clip_fraction | 0.0295 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.686 |
| gen/train/explained_variance | 0.0301 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.127 |
| gen/train/n_updates | 5 |
| gen/train/policy_gradient_loss | -0.0015 |
| gen/train/value_loss | 4.43 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 31.9 |
| gen/rollout/ep_rew_wrapped_mean | 268 |
| gen/time/fps | 4212 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 32768 |
| gen/train/approx_kl | 0.009048736 |
| gen/train/clip_fraction | 0.0295 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.686 |
| gen/train/explained_variance | 0.0301 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.127 |
| gen/train/n_updates | 5 |
| gen/train/policy_gradient_loss | -0.0015 |
| gen/train/value_loss | 4.43 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.685 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.684 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.683 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.682 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.68 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.68 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.679 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.678 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.691 |
| disc/disc_loss | 0.681 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 2 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 31.9 |
| gen/rollout/ep_rew_wrapped_mean | 268 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 3.28e+04 |
| gen/train/approx_kl | 0.0102 |
| gen/train/clip_fraction | 0.133 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.686 |
| gen/train/explained_variance | 0.841 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0145 |
| gen/train/n_updates | 10 |
| gen/train/policy_gradient_loss | -0.00786 |
| gen/train/value_loss | 0.248 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 34.1 |
| gen/rollout/ep_rew_wrapped_mean | 275 |
| gen/time/fps | 4213 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 49152 |
| gen/train/approx_kl | 0.010180451 |
| gen/train/clip_fraction | 0.133 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.686 |
| gen/train/explained_variance | 0.841 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0145 |
| gen/train/n_updates | 10 |
| gen/train/policy_gradient_loss | -0.00786 |
| gen/train/value_loss | 0.248 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.672 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.671 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.67 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.668 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.667 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.668 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.689 |
| disc/disc_loss | 0.664 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.689 |
| disc/disc_loss | 0.661 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.69 |
| disc/disc_loss | 0.668 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 3 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 34.1 |
| gen/rollout/ep_rew_wrapped_mean | 275 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 4.92e+04 |
| gen/train/approx_kl | 0.0153 |
| gen/train/clip_fraction | 0.195 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.673 |
| gen/train/explained_variance | 0.815 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.0246 |
| gen/train/n_updates | 15 |
| gen/train/policy_gradient_loss | -0.0135 |
| gen/train/value_loss | 0.0463 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 37.8 |
| gen/rollout/ep_rew_wrapped_mean | 277 |
| gen/time/fps | 4207 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 65536 |
| gen/train/approx_kl | 0.015265099 |
| gen/train/clip_fraction | 0.195 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.673 |
| gen/train/explained_variance | 0.815 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.0246 |
| gen/train/n_updates | 15 |
| gen/train/policy_gradient_loss | -0.0135 |
| gen/train/value_loss | 0.0463 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.652 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.646 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.646 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.685 |
| disc/disc_loss | 0.64 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.685 |
| disc/disc_loss | 0.638 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.684 |
| disc/disc_loss | 0.634 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.682 |
| disc/disc_loss | 0.628 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.682 |
| disc/disc_loss | 0.625 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.685 |
| disc/disc_loss | 0.639 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 4 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 37.8 |
| gen/rollout/ep_rew_wrapped_mean | 277 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 6.55e+04 |
| gen/train/approx_kl | 0.0161 |
| gen/train/clip_fraction | 0.215 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.654 |
| gen/train/explained_variance | 0.892 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.0168 |
| gen/train/n_updates | 20 |
| gen/train/policy_gradient_loss | -0.0195 |
| gen/train/value_loss | 0.0173 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 40.4 |
| gen/rollout/ep_rew_wrapped_mean | 284 |
| gen/time/fps | 4206 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 81920 |
| gen/train/approx_kl | 0.016116062 |
| gen/train/clip_fraction | 0.215 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.654 |
| gen/train/explained_variance | 0.892 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.0168 |
| gen/train/n_updates | 20 |
| gen/train/policy_gradient_loss | -0.0195 |
| gen/train/value_loss | 0.0173 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.689 |
| disc/disc_loss | 0.659 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.689 |
| disc/disc_loss | 0.659 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.688 |
| disc/disc_loss | 0.655 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.651 |
| disc/disc_proportion_expert_pred | 0 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.5 |
| disc/disc_acc_expert | 0.000977 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.688 |
| disc/disc_loss | 0.652 |
| disc/disc_proportion_expert_pred | 0.000488 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.573 |
| disc/disc_acc_expert | 0.146 |
| disc/disc_acc_gen | 1 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.647 |
| disc/disc_proportion_expert_pred | 0.0728 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.684 |
| disc/disc_acc_expert | 0.374 |
| disc/disc_acc_gen | 0.993 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.647 |
| disc/disc_proportion_expert_pred | 0.19 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.708 |
| disc/disc_acc_expert | 0.434 |
| disc/disc_acc_gen | 0.983 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.642 |
| disc/disc_proportion_expert_pred | 0.225 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.558 |
| disc/disc_acc_expert | 0.119 |
| disc/disc_acc_gen | 0.997 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.652 |
| disc/disc_proportion_expert_pred | 0.0611 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 5 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 40.4 |
| gen/rollout/ep_rew_wrapped_mean | 284 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 8.19e+04 |
| gen/train/approx_kl | 0.0112 |
| gen/train/clip_fraction | 0.129 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.634 |
| gen/train/explained_variance | 0.871 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.00102 |
| gen/train/n_updates | 25 |
| gen/train/policy_gradient_loss | -0.00957 |
| gen/train/value_loss | 0.0103 |
--------------------------------------------------
---------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 40.8 |
| gen/rollout/ep_rew_wrapped_mean | 288 |
| gen/time/fps | 4212 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 98304 |
| gen/train/approx_kl | 0.01118237 |
| gen/train/clip_fraction | 0.129 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.634 |
| gen/train/explained_variance | 0.871 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.00102 |
| gen/train/n_updates | 25 |
| gen/train/policy_gradient_loss | -0.00957 |
| gen/train/value_loss | 0.0103 |
---------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.732 |
| disc/disc_acc_expert | 0.468 |
| disc/disc_acc_gen | 0.997 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.639 |
| disc/disc_proportion_expert_pred | 0.235 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.718 |
| disc/disc_acc_expert | 0.442 |
| disc/disc_acc_gen | 0.994 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.637 |
| disc/disc_proportion_expert_pred | 0.224 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.736 |
| disc/disc_acc_expert | 0.476 |
| disc/disc_acc_gen | 0.996 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.638 |
| disc/disc_proportion_expert_pred | 0.24 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.734 |
| disc/disc_acc_expert | 0.472 |
| disc/disc_acc_gen | 0.996 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.635 |
| disc/disc_proportion_expert_pred | 0.238 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.714 |
| disc/disc_acc_expert | 0.44 |
| disc/disc_acc_gen | 0.987 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.633 |
| disc/disc_proportion_expert_pred | 0.227 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.746 |
| disc/disc_acc_expert | 0.504 |
| disc/disc_acc_gen | 0.988 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.632 |
| disc/disc_proportion_expert_pred | 0.258 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.819 |
| disc/disc_acc_expert | 0.657 |
| disc/disc_acc_gen | 0.981 |
| disc/disc_entropy | 0.686 |
| disc/disc_loss | 0.631 |
| disc/disc_proportion_expert_pred | 0.338 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.856 |
| disc/disc_acc_expert | 0.733 |
| disc/disc_acc_gen | 0.979 |
| disc/disc_entropy | 0.685 |
| disc/disc_loss | 0.627 |
| disc/disc_proportion_expert_pred | 0.377 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.757 |
| disc/disc_acc_expert | 0.524 |
| disc/disc_acc_gen | 0.99 |
| disc/disc_entropy | 0.687 |
| disc/disc_loss | 0.634 |
| disc/disc_proportion_expert_pred | 0.267 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 6 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 40.8 |
| gen/rollout/ep_rew_wrapped_mean | 288 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 9.83e+04 |
| gen/train/approx_kl | 0.00629 |
| gen/train/clip_fraction | 0.0466 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.635 |
| gen/train/explained_variance | 0.873 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0116 |
| gen/train/n_updates | 30 |
| gen/train/policy_gradient_loss | -0.00363 |
| gen/train/value_loss | 0.0126 |
--------------------------------------------------
-----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 39.6 |
| gen/rollout/ep_rew_wrapped_mean | 287 |
| gen/time/fps | 4207 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 114688 |
| gen/train/approx_kl | 0.0062911767 |
| gen/train/clip_fraction | 0.0466 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.635 |
| gen/train/explained_variance | 0.873 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0116 |
| gen/train/n_updates | 30 |
| gen/train/policy_gradient_loss | -0.00363 |
| gen/train/value_loss | 0.0126 |
-----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.852 |
| disc/disc_acc_expert | 0.735 |
| disc/disc_acc_gen | 0.969 |
| disc/disc_entropy | 0.683 |
| disc/disc_loss | 0.62 |
| disc/disc_proportion_expert_pred | 0.383 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.89 |
| disc/disc_acc_expert | 0.812 |
| disc/disc_acc_gen | 0.968 |
| disc/disc_entropy | 0.682 |
| disc/disc_loss | 0.616 |
| disc/disc_proportion_expert_pred | 0.422 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.919 |
| disc/disc_acc_expert | 0.875 |
| disc/disc_acc_gen | 0.964 |
| disc/disc_entropy | 0.681 |
| disc/disc_loss | 0.614 |
| disc/disc_proportion_expert_pred | 0.456 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.95 |
| disc/disc_acc_expert | 0.933 |
| disc/disc_acc_gen | 0.967 |
| disc/disc_entropy | 0.681 |
| disc/disc_loss | 0.61 |
| disc/disc_proportion_expert_pred | 0.483 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.963 |
| disc/disc_acc_expert | 0.955 |
| disc/disc_acc_gen | 0.971 |
| disc/disc_entropy | 0.681 |
| disc/disc_loss | 0.608 |
| disc/disc_proportion_expert_pred | 0.492 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.962 |
| disc/disc_acc_expert | 0.962 |
| disc/disc_acc_gen | 0.963 |
| disc/disc_entropy | 0.679 |
| disc/disc_loss | 0.603 |
| disc/disc_proportion_expert_pred | 0.5 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.967 |
| disc/disc_acc_expert | 0.97 |
| disc/disc_acc_gen | 0.964 |
| disc/disc_entropy | 0.679 |
| disc/disc_loss | 0.602 |
| disc/disc_proportion_expert_pred | 0.503 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.958 |
| disc/disc_acc_expert | 0.977 |
| disc/disc_acc_gen | 0.94 |
| disc/disc_entropy | 0.678 |
| disc/disc_loss | 0.597 |
| disc/disc_proportion_expert_pred | 0.518 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.933 |
| disc/disc_acc_expert | 0.902 |
| disc/disc_acc_gen | 0.963 |
| disc/disc_entropy | 0.681 |
| disc/disc_loss | 0.609 |
| disc/disc_proportion_expert_pred | 0.47 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 7 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 39.6 |
| gen/rollout/ep_rew_wrapped_mean | 287 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.15e+05 |
| gen/train/approx_kl | 0.0087 |
| gen/train/clip_fraction | 0.0778 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.629 |
| gen/train/explained_variance | 0.928 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0141 |
| gen/train/n_updates | 35 |
| gen/train/policy_gradient_loss | -0.00673 |
| gen/train/value_loss | 0.0171 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 39 |
| gen/rollout/ep_rew_wrapped_mean | 282 |
| gen/time/fps | 4209 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 131072 |
| gen/train/approx_kl | 0.008696594 |
| gen/train/clip_fraction | 0.0778 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.629 |
| gen/train/explained_variance | 0.928 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0141 |
| gen/train/n_updates | 35 |
| gen/train/policy_gradient_loss | -0.00673 |
| gen/train/value_loss | 0.0171 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.964 |
| disc/disc_acc_expert | 0.981 |
| disc/disc_acc_gen | 0.946 |
| disc/disc_entropy | 0.671 |
| disc/disc_loss | 0.572 |
| disc/disc_proportion_expert_pred | 0.518 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.977 |
| disc/disc_acc_expert | 0.992 |
| disc/disc_acc_gen | 0.962 |
| disc/disc_entropy | 0.669 |
| disc/disc_loss | 0.563 |
| disc/disc_proportion_expert_pred | 0.515 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.969 |
| disc/disc_acc_expert | 0.994 |
| disc/disc_acc_gen | 0.943 |
| disc/disc_entropy | 0.669 |
| disc/disc_loss | 0.562 |
| disc/disc_proportion_expert_pred | 0.525 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.97 |
| disc/disc_acc_expert | 0.998 |
| disc/disc_acc_gen | 0.941 |
| disc/disc_entropy | 0.667 |
| disc/disc_loss | 0.557 |
| disc/disc_proportion_expert_pred | 0.528 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.97 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.939 |
| disc/disc_entropy | 0.665 |
| disc/disc_loss | 0.553 |
| disc/disc_proportion_expert_pred | 0.53 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.972 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.943 |
| disc/disc_entropy | 0.666 |
| disc/disc_loss | 0.552 |
| disc/disc_proportion_expert_pred | 0.528 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.969 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.938 |
| disc/disc_entropy | 0.663 |
| disc/disc_loss | 0.543 |
| disc/disc_proportion_expert_pred | 0.531 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.976 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.952 |
| disc/disc_entropy | 0.661 |
| disc/disc_loss | 0.539 |
| disc/disc_proportion_expert_pred | 0.524 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.971 |
| disc/disc_acc_expert | 0.996 |
| disc/disc_acc_gen | 0.946 |
| disc/disc_entropy | 0.667 |
| disc/disc_loss | 0.555 |
| disc/disc_proportion_expert_pred | 0.525 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 8 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 39 |
| gen/rollout/ep_rew_wrapped_mean | 282 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.31e+05 |
| gen/train/approx_kl | 0.00855 |
| gen/train/clip_fraction | 0.0715 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.624 |
| gen/train/explained_variance | 0.922 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.00161 |
| gen/train/n_updates | 40 |
| gen/train/policy_gradient_loss | -0.00499 |
| gen/train/value_loss | 0.0237 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 39.6 |
| gen/rollout/ep_rew_wrapped_mean | 271 |
| gen/time/fps | 4214 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 147456 |
| gen/train/approx_kl | 0.008551636 |
| gen/train/clip_fraction | 0.0715 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.624 |
| gen/train/explained_variance | 0.922 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.00161 |
| gen/train/n_updates | 40 |
| gen/train/policy_gradient_loss | -0.00499 |
| gen/train/value_loss | 0.0237 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.96 |
| disc/disc_acc_expert | 0.998 |
| disc/disc_acc_gen | 0.922 |
| disc/disc_entropy | 0.674 |
| disc/disc_loss | 0.571 |
| disc/disc_proportion_expert_pred | 0.538 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.956 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.912 |
| disc/disc_entropy | 0.672 |
| disc/disc_loss | 0.567 |
| disc/disc_proportion_expert_pred | 0.544 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.962 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.924 |
| disc/disc_entropy | 0.67 |
| disc/disc_loss | 0.56 |
| disc/disc_proportion_expert_pred | 0.538 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.966 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.932 |
| disc/disc_entropy | 0.669 |
| disc/disc_loss | 0.558 |
| disc/disc_proportion_expert_pred | 0.534 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.954 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.907 |
| disc/disc_entropy | 0.667 |
| disc/disc_loss | 0.553 |
| disc/disc_proportion_expert_pred | 0.546 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.951 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.902 |
| disc/disc_entropy | 0.667 |
| disc/disc_loss | 0.551 |
| disc/disc_proportion_expert_pred | 0.549 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.96 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.921 |
| disc/disc_entropy | 0.662 |
| disc/disc_loss | 0.539 |
| disc/disc_proportion_expert_pred | 0.54 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.958 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.915 |
| disc/disc_entropy | 0.661 |
| disc/disc_loss | 0.535 |
| disc/disc_proportion_expert_pred | 0.542 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.958 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.917 |
| disc/disc_entropy | 0.668 |
| disc/disc_loss | 0.554 |
| disc/disc_proportion_expert_pred | 0.541 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 9 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 39.6 |
| gen/rollout/ep_rew_wrapped_mean | 271 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.47e+05 |
| gen/train/approx_kl | 0.00591 |
| gen/train/clip_fraction | 0.0515 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.613 |
| gen/train/explained_variance | 0.935 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.00763 |
| gen/train/n_updates | 45 |
| gen/train/policy_gradient_loss | -0.00313 |
| gen/train/value_loss | 0.0288 |
--------------------------------------------------
-----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 44.7 |
| gen/rollout/ep_rew_wrapped_mean | 259 |
| gen/time/fps | 4213 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 163840 |
| gen/train/approx_kl | 0.0059148837 |
| gen/train/clip_fraction | 0.0515 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.613 |
| gen/train/explained_variance | 0.935 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.00763 |
| gen/train/n_updates | 45 |
| gen/train/policy_gradient_loss | -0.00313 |
| gen/train/value_loss | 0.0288 |
-----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.951 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.901 |
| disc/disc_entropy | 0.643 |
| disc/disc_loss | 0.495 |
| disc/disc_proportion_expert_pred | 0.549 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.948 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.896 |
| disc/disc_entropy | 0.638 |
| disc/disc_loss | 0.485 |
| disc/disc_proportion_expert_pred | 0.552 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.955 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.909 |
| disc/disc_entropy | 0.636 |
| disc/disc_loss | 0.481 |
| disc/disc_proportion_expert_pred | 0.545 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.95 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.899 |
| disc/disc_entropy | 0.633 |
| disc/disc_loss | 0.477 |
| disc/disc_proportion_expert_pred | 0.55 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.945 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.89 |
| disc/disc_entropy | 0.633 |
| disc/disc_loss | 0.475 |
| disc/disc_proportion_expert_pred | 0.555 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.942 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.885 |
| disc/disc_entropy | 0.628 |
| disc/disc_loss | 0.468 |
| disc/disc_proportion_expert_pred | 0.558 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.948 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.896 |
| disc/disc_entropy | 0.624 |
| disc/disc_loss | 0.461 |
| disc/disc_proportion_expert_pred | 0.552 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.95 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.899 |
| disc/disc_entropy | 0.615 |
| disc/disc_loss | 0.446 |
| disc/disc_proportion_expert_pred | 0.55 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.948 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.897 |
| disc/disc_entropy | 0.631 |
| disc/disc_loss | 0.474 |
| disc/disc_proportion_expert_pred | 0.552 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 10 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 44.7 |
| gen/rollout/ep_rew_wrapped_mean | 259 |
| gen/time/fps | 4.21e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.64e+05 |
| gen/train/approx_kl | 0.00881 |
| gen/train/clip_fraction | 0.0822 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.596 |
| gen/train/explained_variance | 0.942 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.0335 |
| gen/train/n_updates | 50 |
| gen/train/policy_gradient_loss | -0.00478 |
| gen/train/value_loss | 0.0465 |
--------------------------------------------------
-----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 50.9 |
| gen/rollout/ep_rew_wrapped_mean | 243 |
| gen/time/fps | 4194 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 180224 |
| gen/train/approx_kl | 0.0088136345 |
| gen/train/clip_fraction | 0.0822 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.596 |
| gen/train/explained_variance | 0.942 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | -0.0335 |
| gen/train/n_updates | 50 |
| gen/train/policy_gradient_loss | -0.00478 |
| gen/train/value_loss | 0.0465 |
-----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.788 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.575 |
| disc/disc_entropy | 0.642 |
| disc/disc_loss | 0.543 |
| disc/disc_proportion_expert_pred | 0.712 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.791 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.581 |
| disc/disc_entropy | 0.637 |
| disc/disc_loss | 0.536 |
| disc/disc_proportion_expert_pred | 0.709 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.794 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.588 |
| disc/disc_entropy | 0.632 |
| disc/disc_loss | 0.526 |
| disc/disc_proportion_expert_pred | 0.706 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.781 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.562 |
| disc/disc_entropy | 0.632 |
| disc/disc_loss | 0.533 |
| disc/disc_proportion_expert_pred | 0.719 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.788 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.575 |
| disc/disc_entropy | 0.628 |
| disc/disc_loss | 0.524 |
| disc/disc_proportion_expert_pred | 0.712 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.783 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.566 |
| disc/disc_entropy | 0.624 |
| disc/disc_loss | 0.524 |
| disc/disc_proportion_expert_pred | 0.717 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.787 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.573 |
| disc/disc_entropy | 0.622 |
| disc/disc_loss | 0.519 |
| disc/disc_proportion_expert_pred | 0.713 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.775 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.551 |
| disc/disc_entropy | 0.621 |
| disc/disc_loss | 0.524 |
| disc/disc_proportion_expert_pred | 0.725 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.786 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.572 |
| disc/disc_entropy | 0.63 |
| disc/disc_loss | 0.529 |
| disc/disc_proportion_expert_pred | 0.714 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 11 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 50.9 |
| gen/rollout/ep_rew_wrapped_mean | 243 |
| gen/time/fps | 4.19e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.8e+05 |
| gen/train/approx_kl | 0.00988 |
| gen/train/clip_fraction | 0.117 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.597 |
| gen/train/explained_variance | 0.95 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0114 |
| gen/train/n_updates | 55 |
| gen/train/policy_gradient_loss | -0.00606 |
| gen/train/value_loss | 0.0522 |
--------------------------------------------------
----------------------------------------------------
| raw/ | |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 56.4 |
| gen/rollout/ep_rew_wrapped_mean | 229 |
| gen/time/fps | 4199 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 196608 |
| gen/train/approx_kl | 0.009878516 |
| gen/train/clip_fraction | 0.117 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.597 |
| gen/train/explained_variance | 0.95 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.0114 |
| gen/train/n_updates | 55 |
| gen/train/policy_gradient_loss | -0.00606 |
| gen/train/value_loss | 0.0522 |
----------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.583 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.165 |
| disc/disc_entropy | 0.671 |
| disc/disc_loss | 0.659 |
| disc/disc_proportion_expert_pred | 0.917 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.588 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.177 |
| disc/disc_entropy | 0.672 |
| disc/disc_loss | 0.653 |
| disc/disc_proportion_expert_pred | 0.912 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.588 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.176 |
| disc/disc_entropy | 0.671 |
| disc/disc_loss | 0.653 |
| disc/disc_proportion_expert_pred | 0.912 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.582 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.163 |
| disc/disc_entropy | 0.672 |
| disc/disc_loss | 0.653 |
| disc/disc_proportion_expert_pred | 0.918 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.596 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.192 |
| disc/disc_entropy | 0.673 |
| disc/disc_loss | 0.65 |
| disc/disc_proportion_expert_pred | 0.904 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.596 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.192 |
| disc/disc_entropy | 0.674 |
| disc/disc_loss | 0.646 |
| disc/disc_proportion_expert_pred | 0.904 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.61 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.221 |
| disc/disc_entropy | 0.676 |
| disc/disc_loss | 0.645 |
| disc/disc_proportion_expert_pred | 0.89 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| raw/ | |
| disc/disc_acc | 0.604 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.208 |
| disc/disc_entropy | 0.675 |
| disc/disc_loss | 0.643 |
| disc/disc_proportion_expert_pred | 0.896 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
--------------------------------------------------
--------------------------------------------------
| mean/ | |
| disc/disc_acc | 0.593 |
| disc/disc_acc_expert | 1 |
| disc/disc_acc_gen | 0.187 |
| disc/disc_entropy | 0.673 |
| disc/disc_loss | 0.65 |
| disc/disc_proportion_expert_pred | 0.907 |
| disc/disc_proportion_expert_true | 0.5 |
| disc/global_step | 12 |
| disc/n_expert | 1.02e+03 |
| disc/n_generated | 1.02e+03 |
| gen/rollout/ep_len_mean | 500 |
| gen/rollout/ep_rew_mean | 56.4 |
| gen/rollout/ep_rew_wrapped_mean | 229 |
| gen/time/fps | 4.2e+03 |
| gen/time/iterations | 1 |
| gen/time/time_elapsed | 3 |
| gen/time/total_timesteps | 1.97e+05 |
| gen/train/approx_kl | 0.0124 |
| gen/train/clip_fraction | 0.148 |
| gen/train/clip_range | 0.2 |
| gen/train/entropy_loss | -0.586 |
| gen/train/explained_variance | 0.968 |
| gen/train/learning_rate | 0.0004 |
| gen/train/loss | 0.000133 |
| gen/train/n_updates | 60 |
| gen/train/policy_gradient_loss | -0.00875 |
| gen/train/value_loss | 0.0555 |
--------------------------------------------------
… and finally evaluate it again.
env.seed(SEED)
learner_rewards_after_training, _ = evaluate_policy(
learner, env, 100, return_episode_rewards=True
)
We can see that an untrained policy performs poorly, while GAIL matches expert returns (500):
print(
"Rewards before training:",
np.mean(learner_rewards_before_training),
"+/-",
np.std(learner_rewards_before_training),
)
print(
"Rewards after training:",
np.mean(learner_rewards_after_training),
"+/-",
np.std(learner_rewards_after_training),
)
Rewards before training: 102.6 +/- 24.11514047232568
Rewards after training: 49.76 +/- 16.98535840069323