tuned_lens.nn.lenses¶
Provides lenses for decoding hidden states into logits.
Classes
- class tuned_lens.nn.lenses.Lens(unembed)¶
Abstract base class for all Lens.
- abstract forward(h, idx)¶
Decode hidden states into logits.
- Return type:
Tensor
Convert a hidden state to the final hidden just before the unembedding.
- Parameters:
h – The hidden state to convert.
idx – The layer of the transformer these hidden states come from.
- Return type:
Tensor
- class tuned_lens.nn.lenses.LogitLens(unembed)¶
Unembeds the residual stream into logits.
- forward(h, idx)¶
Decode a hidden state into logits.
- Parameters:
h – The hidden state to decode.
idx – the layer of the transformer these hidden states come from.
- Return type:
Tensor
- classmethod from_model(model)¶
Create a LogitLens from a pretrained model.
- Parameters:
model – A pretrained model from the transformers library you wish to inspect.
- Return type:
For the LogitLens, this is the identity function.
- Return type:
Tensor
- class tuned_lens.nn.lenses.TunedLens(unembed, config)¶
A tuned lens for decoding hidden states into logits.
- forward(h, idx)¶
Transform and then decode the hidden states into logits.
- Return type:
Tensor
- classmethod from_model(model, model_revision=None, bias=True)¶
Create a lens from a pretrained model.
- Parameters:
model – The model to create the lens from.
model_revision – The git revision of the model to used.
bias – Whether to use a bias in the linear translators.
- Return type:
- Returns:
A TunedLens instance.
- classmethod from_model_and_pretrained(model, lens_resource_id=None, **kwargs)¶
Load a tuned lens from a folder or hugging face hub.
- Parameters:
model – The model to create the lens from.
lens_resource_id – The resource id of the lens to load. Defaults to the model’s name_or_path.
**kwargs – Additional arguments to pass to
tuned_lens.load_artifacts.load_lens_artifacts()and th.load.
- Return type:
- Returns:
A TunedLens instance whose unembedding is derived from the given model and whose layer translators are loaded from the given resource id.
- classmethod from_unembed_and_pretrained(unembed, lens_resource_id, **kwargs)¶
Load a tuned lens from a folder or hugging face hub.
- Parameters:
unembed – The unembed operation to use for the lens.
lens_resource_id – The resource id of the lens to load.
**kwargs – Additional arguments to pass to
tuned_lens.load_artifacts.load_lens_artifacts()and th.load.
- Return type:
- Returns:
A TunedLens instance.
- generate(model, layer, input_ids, do_sample=True, temp=1.0, max_new_tokens=100)¶
Generate from the tuned lens at the given layer.
- Parameters:
model – The base model the generate from. Usually the model this lens trained on.
layer – The layer to generate from.
input_ids – (batch x prompt_len) The input ids to generate from.
do_sample – Whether to use sampling or greedy decoding.
temp – The temperature to use for sampling.
max_new_tokens – The maximum number of tokens to generate.
- Return type:
Tensor- Returns:
The prompt concatenated with the newly generated tokens.
- save(path, ckpt='params.pt', config='config.json')¶
Save the lens to a directory.
- Parameters:
path – The path to the directory to save the lens to.
ckpt – The name of the checkpoint file to save the parameters to.
config – The name of the config file to save the config to.
- Return type:
None
Transform hidden state from layer idx.
- Return type:
Tensor
- class tuned_lens.nn.lenses.TunedLensConfig(base_model_name_or_path, d_model, num_hidden_layers, bias=True, base_model_revision=None, unembed_hash=None, lens_type='linear_tuned_lens')¶
A configuration for a TunedLens.
- classmethod from_dict(config_dict)¶
Create a config from a dictionary.
- to_dict()¶
Convert this config to a dictionary.