imitation.scripts.ingredients.reward#
This ingredient provides a reward network.
Functions
|
Sets default values for net_cls and net_kwargs. |
|
Builds a reward network. |
- imitation.scripts.ingredients.reward.config_hook(config, command_name, logger)[source]#
Sets default values for net_cls and net_kwargs.
- imitation.scripts.ingredients.reward.make_reward_net(venv, net_cls, net_kwargs, normalize_output_layer, add_std_alpha, ensemble_size, ensemble_member_config)[source]#
Builds a reward network.
- Parameters
venv (
VecEnv
) – Vectorized environment reward network will predict reward for.net_cls (
Type
[RewardNet
]) – Class of reward network to construct.net_kwargs (
Mapping
[str
,Any
]) – Keyword arguments passed to reward network constructor.normalize_output_layer (
Optional
[Type
[BaseNorm
]]) – Wrapping the reward_net with NormalizedRewardNet to normalize the reward output.add_std_alpha (
Optional
[float
]) – multiple of reward function standard deviation to add to the reward in predict_processed. Must be None when using a reward function that does not keep track of variance. Defaults to None.ensemble_size (
Optional
[int
]) – The number of ensemble members to create. Must set if using net_cls = :class: reward_nets.RewardEnsemble.ensemble_member_config (
Optional
[Mapping
[str
,Any
]]) – The configuration for individual ensemble members. Note that ensemble_member_config.net_cls must not be :class: reward_nets.RewardEnsemble. Must be set if using net_cls = ` :class: `reward_nets.RewardEnsemble.
- Return type
- Returns
A, possibly wrapped, instance of net_cls.
- Raises
ValueError – Using a reward ensemble but failed to provide configuration.