Benchmark Summary#
This is a summary of the sacred runs in benchmark_runs
generated by sacred_output_to_markdown_summary.py
.
Scores#
The scores are normalized based on the performance of a random agent as the baseline and the expert as the maximum possible score as explained in this blog post:
(score - random_score) / (expert_score - random_score)
Aggregate scores and confidence intervals are computed using the rliable library.
AIRL#
Environment |
Score (mean/std) |
Normalized Score (mean/std) |
N |
---|---|---|---|
seals/Ant-v1 |
2485.889 / 533.471 |
0.981 / 0.184 |
10 |
seals/HalfCheetah-v1 |
938.450 / 804.871 |
0.627 / 0.412 |
10 |
seals/Hopper-v1 |
183.780 / 93.295 |
0.921 / 0.373 |
10 |
seals/Swimmer-v1 |
286.699 / 7.763 |
0.970 / 0.027 |
10 |
seals/Walker2d-v1 |
1154.921 / 659.564 |
0.461 / 0.264 |
10 |
Aggregate Normalized scores#
Metric |
Value |
95% CI |
---|---|---|
Mean |
0.792 |
[0.709, 0.792] |
IQM |
0.918 |
[0.871, 0.974] |
BC#
Environment |
Score (mean/std) |
Normalized Score (mean/std) |
N |
---|---|---|---|
seals/Ant-v1 |
2090.551 / 180.340 |
0.844 / 0.062 |
10 |
seals/HalfCheetah-v1 |
1516.476 / 37.487 |
0.923 / 0.019 |
10 |
seals/Hopper-v1 |
204.271 / 0.609 |
1.003 / 0.002 |
10 |
seals/Swimmer-v1 |
276.242 / 9.328 |
0.935 / 0.032 |
10 |
seals/Walker2d-v1 |
2393.254 / 37.641 |
0.956 / 0.015 |
10 |
Aggregate Normalized scores#
Metric |
Value |
95% CI |
---|---|---|
Mean |
0.932 |
[0.922, 0.932] |
IQM |
0.941 |
[0.941, 0.949] |
DAGGER#
Environment |
Score (mean/std) |
Normalized Score (mean/std) |
N |
---|---|---|---|
seals/Ant-v1 |
2302.527 / 108.315 |
0.957 / 0.052 |
10 |
seals/HalfCheetah-v1 |
1615.004 / 8.262 |
1.017 / 0.008 |
10 |
seals/Hopper-v1 |
204.789 / 1.599 |
1.011 / 0.012 |
10 |
seals/Swimmer-v1 |
283.776 / 6.524 |
0.988 / 0.024 |
10 |
seals/Walker2d-v1 |
2419.748 / 52.215 |
1.002 / 0.026 |
10 |
Aggregate Normalized scores#
Metric |
Value |
95% CI |
---|---|---|
Mean |
0.995 |
[0.987, 0.998] |
IQM |
1.004 |
[1.003, 1.008] |
GAIL#
Environment |
Score (mean/std) |
Normalized Score (mean/std) |
N |
---|---|---|---|
seals/Ant-v1 |
2527.566 / 148.034 |
0.995 / 0.051 |
10 |
seals/HalfCheetah-v1 |
1595.129 / 37.374 |
0.963 / 0.019 |
10 |
seals/Hopper-v1 |
187.105 / 14.298 |
0.935 / 0.057 |
10 |
seals/Swimmer-v1 |
249.949 / 74.295 |
0.845 / 0.254 |
10 |
seals/Walker2d-v1 |
2399.196 / 89.949 |
0.959 / 0.036 |
10 |
Aggregate Normalized scores#
Metric |
Value |
95% CI |
---|---|---|
Mean |
0.939 |
[0.900, 0.944] |
IQM |
0.957 |
[0.965, 0.970] |