imitation.scripts.ingredients.reward#

This ingredient provides a reward network.

Functions

`config_hook`(config, command_name, logger)	Sets default values for net_cls and net_kwargs.
`make_reward_net`(venv, net_cls, net_kwargs, ...)	Builds a reward network.

imitation.scripts.ingredients.reward.config_hook(config, command_name, logger)[source]#: Sets default values for net_cls and net_kwargs.

imitation.scripts.ingredients.reward.make_reward_net(venv, net_cls, net_kwargs, normalize_output_layer, add_std_alpha, ensemble_size, ensemble_member_config)[source]#

Builds a reward network.

Parameters

venv (VecEnv) – Vectorized environment reward network will predict reward for.
net_cls (Type[RewardNet]) – Class of reward network to construct.
net_kwargs (Mapping[str, Any]) – Keyword arguments passed to reward network constructor.
normalize_output_layer (Optional[Type[BaseNorm]]) – Wrapping the reward_net with NormalizedRewardNet to normalize the reward output.
add_std_alpha (Optional[float]) – multiple of reward function standard deviation to add to the reward in predict_processed. Must be None when using a reward function that does not keep track of variance. Defaults to None.
ensemble_size (Optional[int]) – The number of ensemble members to create. Must set if using net_cls = :class: reward_nets.RewardEnsemble.
ensemble_member_config (Optional[Mapping[str, Any]]) – The configuration for individual ensemble members. Note that ensemble_member_config.net_cls must not be :class: reward_nets.RewardEnsemble. Must be set if using net_cls = ` :class: `reward_nets.RewardEnsemble.

Return type

RewardNet

Returns

A, possibly wrapped, instance of net_cls.

Raises

ValueError – Using a reward ensemble but failed to provide configuration.