tuned_lens.nn.lenses#

Provides lenses for decoding hidden states into logits.

Classes

class tuned_lens.nn.lenses.Lens(unembed)#

Abstract base class for all Lens.

abstract forward(h, idx)#

Decode hidden states into logits.

Return type:

Tensor

abstract transform_hidden(h, idx)#

Convert a hidden state to the final hidden just before the unembedding.

Parameters:
  • h – The hidden state to convert.

  • idx – The layer of the transformer these hidden states come from.

Return type:

Tensor

class tuned_lens.nn.lenses.LogitLens(unembed)#

Unembeds the residual stream into logits.

forward(h, idx)#

Decode a hidden state into logits.

Parameters:
  • h – The hidden state to decode.

  • idx – the layer of the transformer these hidden states come from.

Return type:

Tensor

classmethod from_model(model)#

Create a LogitLens from a pretrained model.

Parameters:

model – A pretrained model from the transformers library you wish to inspect.

Return type:

LogitLens

transform_hidden(h, idx)#

For the LogitLens, this is the identity function.

Return type:

Tensor

class tuned_lens.nn.lenses.TunedLens(unembed, config)#

A tuned lens for decoding hidden states into logits.

forward(h, idx)#

Transform and then decode the hidden states into logits.

Return type:

Tensor

classmethod from_model(model, model_revision=None, bias=True)#

Create a lens from a pretrained model.

Parameters:
  • model – The model to create the lens from.

  • model_revision – The git revision of the model to used.

  • bias – Whether to use a bias in the linear translators.

Return type:

TunedLens

Returns:

A TunedLens instance.

classmethod from_model_and_pretrained(model, lens_resource_id=None, **kwargs)#

Load a tuned lens from a folder or hugging face hub.

Parameters:
Return type:

TunedLens

Returns:

A TunedLens instance whose unembedding is derived from the given model and whose layer translators are loaded from the given resource id.

classmethod from_unembed_and_pretrained(unembed, lens_resource_id, **kwargs)#

Load a tuned lens from a folder or hugging face hub.

Parameters:
Return type:

TunedLens

Returns:

A TunedLens instance.

generate(model, layer, input_ids, do_sample=True, temp=1.0, max_new_tokens=100)#

Generate from the tuned lens at the given layer.

Parameters:
  • model – The base model the generate from. Usually the model this lens trained on.

  • layer – The layer to generate from.

  • input_ids – (batch x prompt_len) The input ids to generate from.

  • do_sample – Whether to use sampling or greedy decoding.

  • temp – The temperature to use for sampling.

  • max_new_tokens – The maximum number of tokens to generate.

Return type:

Tensor

Returns:

The prompt concatenated with the newly generated tokens.

save(path, ckpt='params.pt', config='config.json')#

Save the lens to a directory.

Parameters:
  • path – The path to the directory to save the lens to.

  • ckpt – The name of the checkpoint file to save the parameters to.

  • config – The name of the config file to save the config to.

Return type:

None

transform_hidden(h, idx)#

Transform hidden state from layer idx.

Return type:

Tensor

class tuned_lens.nn.lenses.TunedLensConfig(base_model_name_or_path, d_model, num_hidden_layers, bias=True, base_model_revision=None, unembed_hash=None, lens_type='linear_tuned_lens')#

A configuration for a TunedLens.

classmethod from_dict(config_dict)#

Create a config from a dictionary.

to_dict()#

Convert this config to a dictionary.