imitation.policies.exploration_wrapper#

Wrapper to turn a policy into a more exploratory version.

Classes

ExplorationWrapper(policy, venv, ...[, ...])

Wraps a PolicyCallable to create a partially randomized version.

class imitation.policies.exploration_wrapper.ExplorationWrapper(policy, venv, random_prob, switch_prob, rng, deterministic_policy=False)[source]#

Bases: object

Wraps a PolicyCallable to create a partially randomized version.

This wrapper randomly switches between two policies: the wrapped policy, and a random one. After each action, the current policy is kept with a certain probability. Otherwise, one of these two policies is chosen at random (without any dependence on what the current policy is).

The random policy uses the action_space.sample() method.

__init__(policy, venv, random_prob, switch_prob, rng, deterministic_policy=False)[source]#

Initializes the ExplorationWrapper.

Parameters
  • policy (Union[BaseAlgorithm, BasePolicy, Callable[[Union[ndarray, Dict[str, ndarray]], Optional[Tuple[ndarray, ...]], Optional[ndarray]], Tuple[ndarray, Optional[Tuple[ndarray, ...]]]], None]) – The policy to randomize.

  • venv (VecEnv) – The environment to use (needed for sampling random actions).

  • random_prob (float) – The probability of picking the random policy when switching.

  • switch_prob (float) – The probability of switching away from the current policy.

  • rng (Generator) – The random state to use for seeding the environment and for switching policies.

  • deterministic_policy (bool) – Whether to make the policy deterministic when not exploring. This must be False when policy is a PolicyCallable.