Benchmark Summary#

This is a summary of the sacred runs in benchmark_runs generated by sacred_output_to_markdown_summary.py.

Scores#

The scores are normalized based on the performance of a random agent as the baseline and the expert as the maximum possible score as explained in this blog post:

(score - random_score) / (expert_score - random_score)

Aggregate scores and confidence intervals are computed using the rliable library.

AIRL#

Environment

Score (mean/std)

Normalized Score (mean/std)

N

seals/Ant-v1

2485.889 / 533.471

0.981 / 0.184

10

seals/HalfCheetah-v1

938.450 / 804.871

0.627 / 0.412

10

seals/Hopper-v1

183.780 / 93.295

0.921 / 0.373

10

seals/Swimmer-v1

286.699 / 7.763

0.970 / 0.027

10

seals/Walker2d-v1

1154.921 / 659.564

0.461 / 0.264

10

Aggregate Normalized scores#

Metric

Value

95% CI

Mean

0.792

[0.709, 0.792]

IQM

0.918

[0.871, 0.974]

BC#

Environment

Score (mean/std)

Normalized Score (mean/std)

N

seals/Ant-v1

2090.551 / 180.340

0.844 / 0.062

10

seals/HalfCheetah-v1

1516.476 / 37.487

0.923 / 0.019

10

seals/Hopper-v1

204.271 / 0.609

1.003 / 0.002

10

seals/Swimmer-v1

276.242 / 9.328

0.935 / 0.032

10

seals/Walker2d-v1

2393.254 / 37.641

0.956 / 0.015

10

Aggregate Normalized scores#

Metric

Value

95% CI

Mean

0.932

[0.922, 0.932]

IQM

0.941

[0.941, 0.949]

DAGGER#

Environment

Score (mean/std)

Normalized Score (mean/std)

N

seals/Ant-v1

2302.527 / 108.315

0.957 / 0.052

10

seals/HalfCheetah-v1

1615.004 / 8.262

1.017 / 0.008

10

seals/Hopper-v1

204.789 / 1.599

1.011 / 0.012

10

seals/Swimmer-v1

283.776 / 6.524

0.988 / 0.024

10

seals/Walker2d-v1

2419.748 / 52.215

1.002 / 0.026

10

Aggregate Normalized scores#

Metric

Value

95% CI

Mean

0.995

[0.987, 0.998]

IQM

1.004

[1.003, 1.008]

GAIL#

Environment

Score (mean/std)

Normalized Score (mean/std)

N

seals/Ant-v1

2527.566 / 148.034

0.995 / 0.051

10

seals/HalfCheetah-v1

1595.129 / 37.374

0.963 / 0.019

10

seals/Hopper-v1

187.105 / 14.298

0.935 / 0.057

10

seals/Swimmer-v1

249.949 / 74.295

0.845 / 0.254

10

seals/Walker2d-v1

2399.196 / 89.949

0.959 / 0.036

10

Aggregate Normalized scores#

Metric

Value

95% CI

Mean

0.939

[0.900, 0.944]

IQM

0.957

[0.965, 0.970]