Mechanistic Interpretability Benchmark

university

https://mib-bench.github.io

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

hij authored a paper about 1 month ago

Blackbox Model Provenance via Palimpsestic Membership Inference

amueller updated a Space 2 months ago

mib-bench/leaderboard

hij authored a paper 4 months ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

View all activity

mib-bench 's models 3

mib-bench/mib-circuits-example

mib-bench/mib-causalvariable-example

mib-bench/interpbench