imitation.policies.exploration_wrapper#
Wrapper to turn a policy into a more exploratory version.
Classes
|
Wraps a PolicyCallable to create a partially randomized version. |
- class imitation.policies.exploration_wrapper.ExplorationWrapper(policy, venv, random_prob, switch_prob, rng, deterministic_policy=False)[source]#
Bases:
object
Wraps a PolicyCallable to create a partially randomized version.
This wrapper randomly switches between two policies: the wrapped policy, and a random one. After each action, the current policy is kept with a certain probability. Otherwise, one of these two policies is chosen at random (without any dependence on what the current policy is).
The random policy uses the action_space.sample() method.
- __init__(policy, venv, random_prob, switch_prob, rng, deterministic_policy=False)[source]#
Initializes the ExplorationWrapper.
- Parameters
policy (
Union
[BaseAlgorithm
,BasePolicy
,Callable
[[Union
[ndarray
,Dict
[str
,ndarray
]],Optional
[Tuple
[ndarray
,...
]],Optional
[ndarray
]],Tuple
[ndarray
,Optional
[Tuple
[ndarray
,...
]]]],None
]) – The policy to randomize.venv (
VecEnv
) – The environment to use (needed for sampling random actions).random_prob (
float
) – The probability of picking the random policy when switching.switch_prob (
float
) – The probability of switching away from the current policy.rng (
Generator
) – The random state to use for seeding the environment and for switching policies.deterministic_policy (
bool
) – Whether to make the policy deterministic when not exploring. This must be False whenpolicy
is aPolicyCallable
.