imitation.algorithms.bc#

Behavioural Cloning (BC).

Trains policy by applying supervised learning to a fixed dataset of (observation, action) pairs generated by some expert demonstrator.

Functions

`enumerate_batches`(batch_it)	Prepends batch stats before the batches of a batch iterator.
`reconstruct_policy`(policy_path[, device])	Reconstruct a saved policy.

Classes

`BC`(*, observation_space, action_space, rng)	Behavioral cloning (BC).
`BCLogger`(logger)	Utility class to help logging information relevant to Behavior Cloning.
`BCTrainingMetrics`(neglogp, entropy, ...)	Container for the different components of behavior cloning loss.
`BatchIteratorWithEpochEndCallback`(...)	Loops through batches from a batch loader and calls a callback after every epoch.
`BehaviorCloningLossCalculator`(ent_weight, ...)	Functor to compute the loss used in Behavior Cloning.
`RolloutStatsComputer`(venv, n_episodes)	Computes statistics about rollouts.

class imitation.algorithms.bc.BC(*, observation_space, action_space, rng, policy=None, demonstrations=None, batch_size=32, minibatch_size=None, optimizer_cls=<class 'torch.optim.adam.Adam'>, optimizer_kwargs=None, ent_weight=0.001, l2_weight=0.0, device='auto', custom_logger=None)[source]#

Bases: DemonstrationAlgorithm

Behavioral cloning (BC).

Recovers a policy via supervised learning from observation-action pairs.

__init__(*, observation_space, action_space, rng, policy=None, demonstrations=None, batch_size=32, minibatch_size=None, optimizer_cls=<class 'torch.optim.adam.Adam'>, optimizer_kwargs=None, ent_weight=0.001, l2_weight=0.0, device='auto', custom_logger=None)[source]#

Builds BC.

Parameters

observation_space (Space) – the observation space of the environment.
action_space (Space) – the action space of the environment.
rng (Generator) – the random state to use for the random number generator.
policy (Optional[ActorCriticPolicy]) – a Stable Baselines3 policy; if unspecified, defaults to FeedForward32Policy.
demonstrations (Union[Iterable[Trajectory], Iterable[TransitionMapping], TransitionsMinimal, None]) – Demonstrations from an expert (optional). Transitions expressed directly as a types.TransitionsMinimal object, a sequence of trajectories, or an iterable of transition batches (mappings from keywords to arrays containing observations, etc).
batch_size (int) – The number of samples in each batch of expert data.
minibatch_size (Optional[int]) – size of minibatch to calculate gradients over. The gradients are accumulated until batch_size examples are processed before making an optimization step. This is useful in GPU training to reduce memory usage, since fewer examples are loaded into memory at once, facilitating training with larger batch sizes, but is generally slower. Must be a factor of batch_size. Optional, defaults to batch_size.
optimizer_cls (Type[Optimizer]) – optimiser to use for supervised training.
optimizer_kwargs (Optional[Mapping[str, Any]]) – keyword arguments, excluding learning rate and weight decay, for optimiser construction.
ent_weight (float) – scaling applied to the policy’s entropy regularization.
l2_weight (float) – scaling applied to the policy’s L2 regularization.
device (Union[str, device]) – name/identity of device to place policy on.
custom_logger (Optional[HierarchicalLogger]) – Where to log to; if None (default), creates a new logger.

Raises

ValueError – If weight_decay is specified in optimizer_kwargs (use the parameter l2_weight instead), or if the batch size is not a multiple of the minibatch size.

allow_variable_horizon: bool#: If True, allow variable horizon trajectories; otherwise error if detected.

property policy: ActorCriticPolicy#

Returns a policy imitating the demonstration data.

Return type: ActorCriticPolicy

set_demonstrations(demonstrations)[source]#

Sets the demonstration data.

Changing the demonstration data on-demand can be useful for interactive algorithms like DAgger.

Parameters: demonstrations (Union[Iterable[Trajectory], Iterable[TransitionMapping], TransitionsMinimal]) – Either a Torch DataLoader, any other iterator that yields dictionaries containing “obs” and “acts” Tensors or NumPy arrays, TransitionKind instance, or a Sequence of Trajectory objects.
Return type: None

train(*, n_epochs=None, n_batches=None, on_epoch_end=None, on_batch_end=None, log_interval=500, log_rollouts_venv=None, log_rollouts_n_episodes=5, progress_bar=True, reset_tensorboard=False)[source]#

Train with supervised learning for some number of epochs.

Here an ‘epoch’ is just a complete pass through the expert data loader, as set by self.set_expert_data_loader(). Note, that when you specify n_batches smaller than the number of batches in an epoch, the on_epoch_end callback will never be called.

Parameters

n_epochs (Optional[int]) – Number of complete passes made through expert data before ending training. Provide exactly one of n_epochs and n_batches.
n_batches (Optional[int]) – Number of batches loaded from dataset before ending training. Provide exactly one of n_epochs and n_batches.
on_epoch_end (Optional[Callable[[], None]]) – Optional callback with no parameters to run at the end of each epoch.
on_batch_end (Optional[Callable[[], None]]) – Optional callback with no parameters to run at the end of each batch.
log_interval (int) – Log stats after every log_interval batches.
log_rollouts_venv (Optional[VecEnv]) – If not None, then this VecEnv (whose observation and actions spaces must match self.observation_space and self.action_space) is used to generate rollout stats, including average return and average episode length. If None, then no rollouts are generated.
log_rollouts_n_episodes (int) – Number of rollouts to generate when calculating rollout stats. Non-positive number disables rollouts.
progress_bar (bool) – If True, then show a progress bar during training.
reset_tensorboard (bool) – If True, then start plotting to Tensorboard from x=0 even if .train() logged to Tensorboard previously. Has no practical effect if .train() is being called for the first time.

class imitation.algorithms.bc.BCLogger(logger)[source]#

Bases: object

Utility class to help logging information relevant to Behavior Cloning.

__init__(logger)[source]#

Create new BC logger.

Parameters: logger (HierarchicalLogger) – The logger to feed all the information to.

log_batch(batch_num, batch_size, num_samples_so_far, training_metrics, rollout_stats)[source]#

log_epoch(epoch_number)[source]#

reset_tensorboard_steps()[source]#

class imitation.algorithms.bc.BCTrainingMetrics(neglogp, entropy, ent_loss, prob_true_act, l2_norm, l2_loss, loss)[source]#

Bases: object

Container for the different components of behavior cloning loss.

__init__(neglogp, entropy, ent_loss, prob_true_act, l2_norm, l2_loss, loss)#

ent_loss: Tensor#

entropy: Optional[Tensor]#

l2_loss: Tensor#

l2_norm: Tensor#

loss: Tensor#

neglogp: Tensor#

prob_true_act: Tensor#

class imitation.algorithms.bc.BatchIteratorWithEpochEndCallback(batch_loader, n_epochs, n_batches, on_epoch_end)[source]#

Bases: object

Loops through batches from a batch loader and calls a callback after every epoch.

Will throw an exception when an epoch contains no batches.

__init__(batch_loader, n_epochs, n_batches, on_epoch_end)#

batch_loader: Iterable[TransitionMapping]#

n_batches: Optional[int]#

n_epochs: Optional[int]#

on_epoch_end: Optional[Callable[[int], None]]#

class imitation.algorithms.bc.BehaviorCloningLossCalculator(ent_weight, l2_weight)[source]#

Bases: object

Functor to compute the loss used in Behavior Cloning.

__init__(ent_weight, l2_weight)#

ent_weight: float#

l2_weight: float#

class imitation.algorithms.bc.RolloutStatsComputer(venv, n_episodes)[source]#

Bases: object

Computes statistics about rollouts.

Parameters

venv (Optional[VecEnv]) – The vectorized environment in which to compute the rollouts.
n_episodes (int) – The number of episodes to base the statistics on.

__init__(venv, n_episodes)#

n_episodes: int#

venv: Optional[VecEnv]#

imitation.algorithms.bc.enumerate_batches(batch_it)[source]#

Prepends batch stats before the batches of a batch iterator.

Return type: Iterable[Tuple[Tuple[int, int, int], TransitionMapping]]

imitation.algorithms.bc.reconstruct_policy(policy_path, device='auto')[source]#

Reconstruct a saved policy.

Parameters

policy_path (str) – path where .save_policy() has been run.
device (Union[device, str]) – device on which to load the policy.

Returns

policy with reloaded weights.

Return type

policy