tuned_lens.plotting.prediction_trajectory#

Plot a lens table for some given text and model.

Classes

class tuned_lens.plotting.prediction_trajectory.PredictionTrajectory(log_probs, input_ids, targets=None, anti_targets=None, tokenizer=None)#

Contains the trajectory predictions for a sequence of tokens.

A prediction trajectory is the set of next token predictions produced by the conjunction of a lens and a model when evaluated on a specific sequence of tokens. This class include multiple methods for visualizing different aspects of the trajectory.

anti_targets: Optional[ndarray[Any, dtype[int64]]] = None#: (…, seq_len)

property batch_axes: Sequence[int]#: Returns the batch axes for the trajectory.

property batch_shape: Sequence[int]#: Returns the batch shape of the trajectory.

cross_entropy(**kwargs)#

The cross entropy of the predictions to the targets.

Parameters:: **kwargs – are passed to largest_prob_labels.
Return type:: TrajectoryStatistic
Returns:: A TrajectoryStatistic with the cross entropy of the predictions to the targets.

entropy(**kwargs)#

The entropy of the predictions.

Parameters:: **kwargs – are passed to largest_prob_labels.
Return type:: TrajectoryStatistic
Returns:: A TrajectoryStatistic with the entropy of the predictions.

forward_kl(**kwargs)#

KL divergence of the lens predictions to the model predictions.

Parameters:: **kwargs – are passed to largest_prob_labels.
Return type:: TrajectoryStatistic
Returns:: A TrajectoryStatistic with the KL divergence of the lens predictions to the final output of the model.

classmethod from_lens_and_cache(lens, input_ids, cache, model_logits, targets=None, anti_targets=None, residual_component='resid_pre', mask_input=False)#

Construct a prediction trajectory from a set of residual stream vectors.

Parameters:

lens – A lens to use to produce the predictions.
cache – the activation cache produced by running the model.
input_ids – (…, seq_len) Ids that where input into the model.
model_logits – (…, seq_len x d_vocab) the models final output logits.
targets – (…, seq_len) the targets the model is should predict. Used for cross_entropy() and log_prob_diff() visualization.
anti_targets – (…, seq_len) the incorrect label the model should not predict. Used for log_prob_diff() visualization.
residual_component – Name of the stream vector being visualized.
mask_input – Whether to mask the input ids when computing the log probs.

Return type:

PredictionTrajectory

Returns:

PredictionTrajectory constructed from the residual stream vectors.

classmethod from_lens_and_model(lens, model, input_ids, tokenizer=None, targets=None, anti_targets=None, mask_input=False)#

Construct a prediction trajectory from a set of residual stream vectors.

Parameters:

lens – A lens to use to produce the predictions. Note this should be compatible with the model.
model – A Hugging Face causal language model to use to produce the predictions.
tokenizer – The tokenizer to use for decoding the input ids.
input_ids – (seq_len) Ids that where input into the model.
targets – (seq_len) the targets the model is should predict. Used for cross_entropy() and log_prob_diff() visualization.
anti_targets – (seq_len) the incorrect label the model should not predict. Used for log_prob_diff() visualization.
residual_component – Name of the stream vector being visualized.
mask_input – Whether to mask the input ids when computing the log probs.

Return type:

PredictionTrajectory

Returns:

PredictionTrajectory constructed from the residual stream vectors.

input_ids: ndarray[Any, dtype[int64]]#: (…, seq_len)

js_divergence(other, **kwargs)#

Compute the JS divergence between self and other prediction trajectory.

Parameters:

other – The other prediction trajectory to compare to.
**kwargs – are passed to largest_delta_in_prob_labels.

Return type:

TrajectoryStatistic

Returns:

A TrajectoryStatistic with the JS divergence between self and other.

kl_divergence(other, **kwargs)#

Compute the KL divergence between self and other prediction trajectory.

Parameters:

other – The other prediction trajectory to compare to.
**kwargs – are passed to largest_delta_in_prob_labels.

Return type:

TrajectoryStatistic

Returns:

A TrajectoryStatistic with the KL divergence between self and other.

log_prob_diff(delta=False)#

The difference in logits between two tokens.

Return type:: TrajectoryStatistic
Returns:: The difference between the log probabilities of the two tokens.

log_probs: ndarray[Any, dtype[float32]]#: (…, n_layers, seq_len, vocab_size) The log probabilities of the predictions for each hidden layer + the models logits

max_probability(**kwargs)#

Max probability of the among the predictions.

Parameters:: **kwargs – are passed to largest_prob_labels.
Return type:: TrajectoryStatistic
Returns:: A TrajectoryStatistic with the max probability of the among the predictions.

property model_log_probs: ndarray[Any, dtype[float32]]#: Returns the log probs of the model (…, seq_len, vocab_size).

property n_batch_axis: int#: Returns the number of batch dimensions.

property num_layers: int#: Returns the number of layers in the stream not including the model output.

property num_tokens: int#: Returns the number of tokens in this slice of the sequence.

property probs: ndarray[Any, dtype[float32]]#: Returns the probabilities of the predictions.

rank(show_ranks=False, **kwargs)#

The rank of the targets among the predictions.

That is, if the target is the most likely prediction, its rank is 1; the second most likely has rank 2, etc.

Parameters:

show_ranks – Whether to show the the rank of the target or the top token.
**kwargs – are passed to largest_prob_labels.

Return type:

TrajectoryStatistic

Returns:

A TrajectoryStatistic with the rank of the targets among the predictions.

slice_sequence(slice)#

Create a slice of the prediction trajectory along the sequence dimension.

Return type:: PredictionTrajectory

targets: Optional[ndarray[Any, dtype[int64]]] = None#: (…, seq_len)

total_variation(other, **kwargs)#

Total variation distance between self and other prediction trajectory.

Parameters:

other – The other prediction trajectory to compare to.
**kwargs – are passed to largest_delta_in_prob_labels.

Return type:

TrajectoryStatistic

Returns:

A TrajectoryStatistic with the total variational distance between self and other.

property vocab_size: int#: Returns the size of the vocabulary.