tuned_lens.nn.lenses#

Provides lenses for decoding hidden states into logits.

Classes

class tuned_lens.nn.lenses.Lens(unembed)#

Abstract base class for all Lens.

abstract forward(h, idx)#

Decode hidden states into logits.

Return type:: Tensor

abstract transform_hidden(h, idx)#

Convert a hidden state to the final hidden just before the unembedding.

Parameters:

h – The hidden state to convert.
idx – The layer of the transformer these hidden states come from.

Return type:

Tensor

class tuned_lens.nn.lenses.LogitLens(unembed)#

Unembeds the residual stream into logits.

forward(h, idx)#

Decode a hidden state into logits.

Parameters:

h – The hidden state to decode.
idx – the layer of the transformer these hidden states come from.

Return type:

Tensor

classmethod from_model(model)#

Create a LogitLens from a pretrained model.

Parameters:: model – A pretrained model from the transformers library you wish to inspect.
Return type:: LogitLens

transform_hidden(h, idx)#

For the LogitLens, this is the identity function.

Return type:: Tensor

class tuned_lens.nn.lenses.TunedLens(unembed, config)#

A tuned lens for decoding hidden states into logits.

forward(h, idx)#

Transform and then decode the hidden states into logits.

Return type:: Tensor

classmethod from_model(model, model_revision=None, bias=True)#

Create a lens from a pretrained model.

Parameters:

model – The model to create the lens from.
model_revision – The git revision of the model to used.
bias – Whether to use a bias in the linear translators.

Return type:

TunedLens

Returns:

A TunedLens instance.

classmethod from_model_and_pretrained(model, lens_resource_id=None, **kwargs)#

Load a tuned lens from a folder or hugging face hub.

Parameters:

model – The model to create the lens from.
lens_resource_id – The resource id of the lens to load. Defaults to the model’s name_or_path.
**kwargs – Additional arguments to pass to tuned_lens.load_artifacts.load_lens_artifacts() and th.load.

Return type:

TunedLens

Returns:

A TunedLens instance whose unembedding is derived from the given model and whose layer translators are loaded from the given resource id.

classmethod from_unembed_and_pretrained(unembed, lens_resource_id, **kwargs)#

Load a tuned lens from a folder or hugging face hub.

Parameters:

unembed – The unembed operation to use for the lens.
lens_resource_id – The resource id of the lens to load.
**kwargs – Additional arguments to pass to tuned_lens.load_artifacts.load_lens_artifacts() and th.load.

Return type:

TunedLens

Returns:

A TunedLens instance.

generate(model, layer, input_ids, do_sample=True, temp=1.0, max_new_tokens=100)#

Generate from the tuned lens at the given layer.

Parameters:

model – The base model the generate from. Usually the model this lens trained on.
layer – The layer to generate from.
input_ids – (batch x prompt_len) The input ids to generate from.
do_sample – Whether to use sampling or greedy decoding.
temp – The temperature to use for sampling.
max_new_tokens – The maximum number of tokens to generate.

Return type:

Tensor

Returns:

The prompt concatenated with the newly generated tokens.

save(path, ckpt='params.pt', config='config.json')#

Save the lens to a directory.

Parameters:

path – The path to the directory to save the lens to.
ckpt – The name of the checkpoint file to save the parameters to.
config – The name of the config file to save the config to.

Return type:

None

transform_hidden(h, idx)#

Transform hidden state from layer idx.

Return type:: Tensor

class tuned_lens.nn.lenses.TunedLensConfig(base_model_name_or_path, d_model, num_hidden_layers, bias=True, base_model_revision=None, unembed_hash=None, lens_type='linear_tuned_lens')#

A configuration for a TunedLens.

classmethod from_dict(config_dict)#: Create a config from a dictionary.

to_dict()#: Convert this config to a dictionary.