tuned_lens.nn.lenses#
Provides lenses for decoding hidden states into logits.
Classes
- class tuned_lens.nn.lenses.Lens(unembed)#
Abstract base class for all Lens.
- abstract forward(h, idx)#
Decode hidden states into logits.
- Return type:
Tensor
Convert a hidden state to the final hidden just before the unembedding.
- Parameters:
h β The hidden state to convert.
idx β The layer of the transformer these hidden states come from.
- Return type:
Tensor
- class tuned_lens.nn.lenses.LogitLens(unembed)#
Unembeds the residual stream into logits.
- forward(h, idx)#
Decode a hidden state into logits.
- Parameters:
h β The hidden state to decode.
idx β the layer of the transformer these hidden states come from.
- Return type:
Tensor
- classmethod from_model(model)#
Create a LogitLens from a pretrained model.
- Parameters:
model β A pretrained model from the transformers library you wish to inspect.
- Return type:
For the LogitLens, this is the identity function.
- Return type:
Tensor
- class tuned_lens.nn.lenses.TunedLens(unembed, config)#
A tuned lens for decoding hidden states into logits.
- forward(h, idx)#
Transform and then decode the hidden states into logits.
- Return type:
Tensor
- classmethod from_model(model, model_revision=None, bias=True)#
Create a lens from a pretrained model.
- Parameters:
model β The model to create the lens from.
model_revision β The git revision of the model to used.
bias β Whether to use a bias in the linear translators.
- Return type:
- Returns:
A TunedLens instance.
- classmethod from_model_and_pretrained(model, lens_resource_id=None, **kwargs)#
Load a tuned lens from a folder or hugging face hub.
- Parameters:
model β The model to create the lens from.
lens_resource_id β The resource id of the lens to load. Defaults to the modelβs name_or_path.
**kwargs β Additional arguments to pass to
tuned_lens.load_artifacts.load_lens_artifacts()
and th.load.
- Return type:
- Returns:
A TunedLens instance whose unembedding is derived from the given model and whose layer translators are loaded from the given resource id.
- classmethod from_unembed_and_pretrained(unembed, lens_resource_id, **kwargs)#
Load a tuned lens from a folder or hugging face hub.
- Parameters:
unembed β The unembed operation to use for the lens.
lens_resource_id β The resource id of the lens to load.
**kwargs β Additional arguments to pass to
tuned_lens.load_artifacts.load_lens_artifacts()
and th.load.
- Return type:
- Returns:
A TunedLens instance.
- generate(model, layer, input_ids, do_sample=True, temp=1.0, max_new_tokens=100)#
Generate from the tuned lens at the given layer.
- Parameters:
model β The base model the generate from. Usually the model this lens trained on.
layer β The layer to generate from.
input_ids β (batch x prompt_len) The input ids to generate from.
do_sample β Whether to use sampling or greedy decoding.
temp β The temperature to use for sampling.
max_new_tokens β The maximum number of tokens to generate.
- Return type:
Tensor
- Returns:
The prompt concatenated with the newly generated tokens.
- save(path, ckpt='params.pt', config='config.json')#
Save the lens to a directory.
- Parameters:
path β The path to the directory to save the lens to.
ckpt β The name of the checkpoint file to save the parameters to.
config β The name of the config file to save the config to.
- Return type:
None
Transform hidden state from layer idx.
- Return type:
Tensor
- class tuned_lens.nn.lenses.TunedLensConfig(base_model_name_or_path, d_model, num_hidden_layers, bias=True, base_model_revision=None, unembed_hash=None, lens_type='linear_tuned_lens')#
A configuration for a TunedLens.
- classmethod from_dict(config_dict)#
Create a config from a dictionary.
- to_dict()#
Convert this config to a dictionary.