imitation.policies.exploration_wrapper#
Wrapper to turn a policy into a more exploratory version.
Classes
|
Wraps a PolicyCallable to create a partially randomized version. |
- class imitation.policies.exploration_wrapper.ExplorationWrapper(policy, venv, random_prob, switch_prob, rng, deterministic_policy=False)[source]#
Bases:
objectWraps a PolicyCallable to create a partially randomized version.
This wrapper randomly switches between two policies: the wrapped policy, and a random one. After each action, the current policy is kept with a certain probability. Otherwise, one of these two policies is chosen at random (without any dependence on what the current policy is).
The random policy uses the action_space.sample() method.
- __init__(policy, venv, random_prob, switch_prob, rng, deterministic_policy=False)[source]#
Initializes the ExplorationWrapper.
- Parameters
policy (
Union[BaseAlgorithm,BasePolicy,Callable[[Union[ndarray,Dict[str,ndarray]],Optional[Tuple[ndarray,...]],Optional[ndarray]],Tuple[ndarray,Optional[Tuple[ndarray,...]]]],None]) – The policy to randomize.venv (
VecEnv) – The environment to use (needed for sampling random actions).random_prob (
float) – The probability of picking the random policy when switching.switch_prob (
float) – The probability of switching away from the current policy.rng (
Generator) – The random state to use for seeding the environment and for switching policies.deterministic_policy (
bool) – Whether to make the policy deterministic when not exploring. This must be False whenpolicyis aPolicyCallable.