imitation.scripts.ingredients.reward#

This ingredient provides a reward network.

Functions

config_hook(config, command_name, logger)

Sets default values for net_cls and net_kwargs.

make_reward_net(venv, net_cls, net_kwargs, ...)

Builds a reward network.

imitation.scripts.ingredients.reward.config_hook(config, command_name, logger)[source]#

Sets default values for net_cls and net_kwargs.

imitation.scripts.ingredients.reward.make_reward_net(venv, net_cls, net_kwargs, normalize_output_layer, add_std_alpha, ensemble_size, ensemble_member_config)[source]#

Builds a reward network.

Parameters
  • venv (VecEnv) – Vectorized environment reward network will predict reward for.

  • net_cls (Type[RewardNet]) – Class of reward network to construct.

  • net_kwargs (Mapping[str, Any]) – Keyword arguments passed to reward network constructor.

  • normalize_output_layer (Optional[Type[BaseNorm]]) – Wrapping the reward_net with NormalizedRewardNet to normalize the reward output.

  • add_std_alpha (Optional[float]) – multiple of reward function standard deviation to add to the reward in predict_processed. Must be None when using a reward function that does not keep track of variance. Defaults to None.

  • ensemble_size (Optional[int]) – The number of ensemble members to create. Must set if using net_cls = :class: reward_nets.RewardEnsemble.

  • ensemble_member_config (Optional[Mapping[str, Any]]) – The configuration for individual ensemble members. Note that ensemble_member_config.net_cls must not be :class: reward_nets.RewardEnsemble. Must be set if using net_cls = ` :class: `reward_nets.RewardEnsemble.

Return type

RewardNet

Returns

A, possibly wrapped, instance of net_cls.

Raises

ValueError – Using a reward ensemble but failed to provide configuration.