Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.04185

Single-Layer SAEs with Transformers

TopK SAEs trained on the residual stream activation vectors from a single transformer layer, including the transformers.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/sae-pythia-70m-deduped-x64-k32-tfm-layers-0

Updated Dec 2, 2024 • 8
tim-lawson/sae-pythia-70m-deduped-x64-k32-tfm-layers-1

Updated Dec 2, 2024 • 7
tim-lawson/sae-pythia-70m-deduped-x64-k32-tfm-layers-2

Updated Dec 2, 2024 • 10

Multi-Layer SAEs with Tuned Lens and Transformers

Single SAEs trained on the residual stream activation vectors from every layer simultaneously using tuned lenses, including the transformers.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32-lens-tfm

Updated Sep 18, 2024 • 5
tim-lawson/mlsae-pythia-70m-deduped-x2-k32-lens-tfm

Updated Sep 18, 2024 • 6
tim-lawson/mlsae-pythia-70m-deduped-x4-k32-lens-tfm

Updated Sep 18, 2024 • 7

Multi-Layer SAEs with Transformers

Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously, including the transformers.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32-tfm

Updated Dec 2, 2024 • 13
tim-lawson/mlsae-pythia-70m-deduped-x2-k32-tfm

Updated Dec 2, 2024 • 11
tim-lawson/mlsae-pythia-70m-deduped-x4-k32-tfm

Updated Dec 2, 2024 • 11

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published Oct 17 • 1
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published Oct 17 • 69
Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1 • 4
Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published Oct 1 • 2

Single-Layer SAEs

TopK SAEs trained on the residual stream activation vectors from a single transformer layer.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/sae-pythia-70m-deduped-x64-k32-layers-0

Updated Dec 2, 2024 • 10
tim-lawson/sae-pythia-70m-deduped-x64-k32-layers-1

Updated Dec 2, 2024 • 6
tim-lawson/sae-pythia-70m-deduped-x64-k32-layers-2

Updated Dec 2, 2024 • 7

Multi-Layer SAEs with Tuned Lens

Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously using tuned lenses.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32-lens

Updated Sep 18, 2024 • 7
tim-lawson/mlsae-pythia-70m-deduped-x2-k32-lens

Updated Sep 18, 2024 • 6
tim-lawson/mlsae-pythia-70m-deduped-x4-k32-lens

Updated Sep 18, 2024 • 6

Multi-Layer SAEs

Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously: https://arxiv.org/abs/2409.04185

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32

Updated Dec 2, 2024 • 11
tim-lawson/mlsae-pythia-70m-deduped-x2-k32

Updated Dec 2, 2024 • 8
tim-lawson/mlsae-pythia-70m-deduped-x4-k32

Updated Dec 2, 2024 • 8

Single-Layer SAEs with Transformers

TopK SAEs trained on the residual stream activation vectors from a single transformer layer, including the transformers.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/sae-pythia-70m-deduped-x64-k32-tfm-layers-0

Updated Dec 2, 2024 • 8
tim-lawson/sae-pythia-70m-deduped-x64-k32-tfm-layers-1

Updated Dec 2, 2024 • 7
tim-lawson/sae-pythia-70m-deduped-x64-k32-tfm-layers-2

Updated Dec 2, 2024 • 10

Single-Layer SAEs

TopK SAEs trained on the residual stream activation vectors from a single transformer layer.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/sae-pythia-70m-deduped-x64-k32-layers-0

Updated Dec 2, 2024 • 10
tim-lawson/sae-pythia-70m-deduped-x64-k32-layers-1

Updated Dec 2, 2024 • 6
tim-lawson/sae-pythia-70m-deduped-x64-k32-layers-2

Updated Dec 2, 2024 • 7

Multi-Layer SAEs with Tuned Lens and Transformers

Single SAEs trained on the residual stream activation vectors from every layer simultaneously using tuned lenses, including the transformers.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32-lens-tfm

Updated Sep 18, 2024 • 5
tim-lawson/mlsae-pythia-70m-deduped-x2-k32-lens-tfm

Updated Sep 18, 2024 • 6
tim-lawson/mlsae-pythia-70m-deduped-x4-k32-lens-tfm

Updated Sep 18, 2024 • 7

Multi-Layer SAEs with Tuned Lens

Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously using tuned lenses.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32-lens

Updated Sep 18, 2024 • 7
tim-lawson/mlsae-pythia-70m-deduped-x2-k32-lens

Updated Sep 18, 2024 • 6
tim-lawson/mlsae-pythia-70m-deduped-x4-k32-lens

Updated Sep 18, 2024 • 6

Multi-Layer SAEs with Transformers

Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously, including the transformers.

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32-tfm

Updated Dec 2, 2024 • 13
tim-lawson/mlsae-pythia-70m-deduped-x2-k32-tfm

Updated Dec 2, 2024 • 11
tim-lawson/mlsae-pythia-70m-deduped-x4-k32-tfm

Updated Dec 2, 2024 • 11

Multi-Layer SAEs

Single SAEs trained on the residual stream activation vectors from every transformer layer simultaneously: https://arxiv.org/abs/2409.04185

Residual Stream Analysis with Multi-Layer SAEs

Paper • 2409.04185 • Published Sep 6, 2024 • 1
tim-lawson/mlsae-pythia-70m-deduped-x1-k32

Updated Dec 2, 2024 • 11
tim-lawson/mlsae-pythia-70m-deduped-x2-k32

Updated Dec 2, 2024 • 8
tim-lawson/mlsae-pythia-70m-deduped-x4-k32

Updated Dec 2, 2024 • 8

🔍 Interpretability & Analysis of LMs

Outstanding research in LM interpretability and evaluation, summarized

Latent Reasoning in LLMs as a Vocabulary-Space Superposition

Paper • 2510.15522 • Published Oct 17 • 1
Language Models are Injective and Hence Invertible

Paper • 2510.15511 • Published Oct 17 • 69
Eliciting Secret Knowledge from Language Models

Paper • 2510.01070 • Published Oct 1 • 4
Interpreting Language Models Through Concept Descriptions: A Survey

Paper • 2510.01048 • Published Oct 1 • 2

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs