Release Notes#

v0.4.0#

Released on 2023-07-17 - GitHub - PyPI

Continuous Integration: Add support for Mac OS; remove dependency on MuJoCo
Preference comparison: improved logging, support for active learning based on variance of ensemble.
HuggingFace integration for model and dataset loading.
Benchmarking: add results and example configs.
Documentation: add notebook tutorials; other general improvements.
General changes: migrate to pathlib; add more type hints to enable mypy as well as pytype.

Full Changelog: v0.3.1...v0.4.0

Released on 2022-07-29 - GitHub - PyPI

Main changes:

Added reward ensembles and conservative reward functions by @levmckinney in #460
Dropping support for python 3.7 by @levmckinney in #505

Minor changes:

Full Changelog: v0.3.0...v0.3.1

Released on 2022-07-26 - GitHub - PyPI

New features:

New algorithm: Deep RL from Human Preferences (thanks to @ejnnr @norabelrose et al)
Notebooks with examples (thanks to @ernestum)
Serialized trajectories using NumPy arrays rather than pickles, ensuring stability across versions and saving space on disk (thanks to @norabelrose)
Weights and Biases logging support (thanks to @yawen-d)

Improvements:

Port MCE IRL from JAX to Torch, eliminating the JAX dependency. (thanks to @qxcv)
Refactor RewardNet code to be independent from AIRL, and shared across algorithms. (thanks to @ejnnr)
Add Windows support including continuous integration. (thanks to @taufeeque9)

Released on 2020-10-23 - GitHub - PyPI

Released on 2020-09-01 - GitHub - PyPI

Released on 2020-05-09 - GitHub - PyPI

Prototype versions of AIRL, GAIL, BC, DAGGER.