Release Notes#
v0.4.0#
Released on 2023-07-17 - GitHub - PyPI
What's Changed
- Continuous Integration: Add support for Mac OS; remove dependency on MuJoCo
- Preference comparison: improved logging, support for active learning based on variance of ensemble.
- HuggingFace integration for model and dataset loading.
- Benchmarking: add results and example configs.
- Documentation: add notebook tutorials; other general improvements.
- General changes: migrate to pathlib; add more type hints to enable mypy as well as pytype.
Full Changelog: v0.3.1...v0.4.0
v0.3.1#
Released on 2022-07-29 - GitHub - PyPI
What's Changed
Main changes:
- Added reward ensembles and conservative reward functions by @levmckinney in #460
- Dropping support for python 3.7 by @levmckinney in #505
Minor changes:
- Docstring and other fixes after #472 by @Rocamonde in #497
- Improve Windows CI by @AdamGleave in #495
Full Changelog: v0.3.0...v0.3.1
v0.3.0: Major improvements#
Released on 2022-07-26 - GitHub - PyPI
New features:
- New algorithm: Deep RL from Human Preferences (thanks to @ejnnr @norabelrose et al)
- Notebooks with examples (thanks to @ernestum)
- Serialized trajectories using NumPy arrays rather than pickles, ensuring stability across versions and saving space on disk (thanks to @norabelrose)
- Weights and Biases logging support (thanks to @yawen-d)
Improvements:
- Port MCE IRL from JAX to Torch, eliminating the JAX dependency. (thanks to @qxcv)
- Refactor RewardNet code to be independent from AIRL, and shared across algorithms. (thanks to @ejnnr)
- Add Windows support including continuous integration. (thanks to @taufeeque9)
v0.2.0: First PyTorch release#
v0.1.1: Final TF1 release#
v0.1.0: Initial release#
Released on 2020-05-09 - GitHub - PyPI
Prototype versions of AIRL, GAIL, BC, DAGGER.