open source interpretability platform
api · steering · activations · autointerp · scoring · inference · search · filter · dashboards · benchmarks · cossim · umap · embeds · probes · saes · lists · exports · uploads
- about neuronpedia
- instant start - vercel deploy
- quick start - local webapp + demo environment
- setting up your local environment
- "i want to use a local database / import more neuronpedia data"
- "i want to do webapp (frontend + api) development"
- "i want to run/develop inference locally"
- 'i want to run/develop autointerp locally`
- 'i want to do high volume autointerp explanations'
- 'i want to generate my own dashboards/data and add it to neuronpedia'
- architecture
- security
- contact / support
- contributing
- appendix
check out our blog post about Neuronpedia, why we're open sourcing it, and other details. there's also a tweet thread with quick demos.
feature overview
a diagram showing the main features of neuronpedia as of march 2025.
click the Deploy
button to instantly deploy a custom neuronpedia. a free vercel account is required.
here's how easy it is to deploy a "PuppyNeurons" fork of Neuronpedia:
puppyneurons-shrink.mp4
this sets up the webapp (frontend + api) locally, and connects to a public remote demo database and public inference servers
after following the quick start, you will be able to use neuronpedia for some sources/SAEs we have preloaded in gpt2-small
and gemma-2-2b/-it
.
⚠️ warning: since you are connecting to a public, read-only demo database, you will not be able to add new data immediately. you will need to follow subsequent steps to configure your own database that you can write to.
- install docker desktop (UI) or docker engine (no UI), and launch it.
- generate your local
.env
make init-env
- build the webapp (this will take ~10 min the first time)
make webapp-demo-build
- bring up the webapp
make webapp-demo-run
- once everything is up, open localhost:3000 to load the home page.
- your local instance is connected to the remote demo database and inference servers, with the following SAEs/sources data available:
model | source/sae | comment |
---|---|---|
gpt2-small |
res-jb , all layers |
a small starter SAE set |
gemma-2-2b / gemma-2-2b-it |
gemmascope-res-16k , all layers |
the SAEs used in the Gemma Scope demo |
deepseek-r1-distill-llama-8b |
llamascope-slimpj-res-32k , layer 15 |
SAE for a reasoning model, trained by OpenMOSS |
-
example things you can do (links work after
make webapp-demo-run
)i. steering - steer gpt2-small on cats
ii. activation tests/search - test activation for a gemma-2-2b feature
iii. search by explanation, if you configured an
OPENAI_API_KEY
- search for parrots featuresiv. browse dashboards - a parrot feature
v. run the gemma-scope demo
-
now that we've set up a local webapp that's usable, this is a good time to quickly review neuronpedia's simple architecture and its individual services, so that you can get a better understanding of what you'll set up later. then, keep going to setting up your local environment.
🔥 pro-tip: see all the available
make
commands by runningmake help
once you've played around with the demo, you will start running into limitations, like having a limited number of models/SAEs to use, or not being able to generate new explanations. this is because the public demo database is read-only.
ideally, you will probably eventually want to do all of the sub-sections below, so you can have everything running locally. however, you may only be interested in specific parts of neuronpedia to start:
- if you want to jump into developing webapp frontend or api with the demo environment, follow webapp dev
- if you want to start loading more sources/data and relying on your own local database, follow local database
🔥 pro-tip: neuronpedia is configured for AI agent development. here's an example using a single prompt to build a custom app (Steerify) using Neuronpedia's inference server as a backend:
steerify-shrink.mp4
relying on the demo environment means you are limited to read-only access to a specific set of SAEs. these steps show you how to configure and connect to your own local database. you can then download sources/SAEs of your choosing:
import-shrink.mp4
⚠️ warning: your database will start out empty. you will need to use the admin panel to import sources/data (activations, explanations, etc).
⚠️ warning: the local database environment does not have any inference servers connected, so you won't be able to do activation testing, steering, etc initially. you will need to configure a local inference instance.
- build the webapp
make webapp-localhost-build
- bring up the webapp
make webapp-localhost-run
- go to localhost:3000 to see your local webapp instance, which is now connected to your local database
- see the
warnings
above for caveats, andnext steps
to finish setting up
- click here for how to import data into your local database (activations, explanations, etc), because your local database will be empty to start
- click here for how to bring up a local
inference
service for the model/source/SAE you're working with
the webapp builds you've been doing so far are production builds, which are slow to build, and fast to run. since they are slow to build and don't have debug information, they are not ideal for development.
this subsection installs the development build on your local machine (not docker), then mounts the build inside your docker instance.
once you do this section, you'll be able to do local development and quickly see changes that are made, as well as see more informative debug/errors. if you are purely interested in doing frontend/api development for neuronpedia, you don't need to set up anything else!
- install nodejs via node version manager
make install-nodejs
- install the webapp's dependencies
make webapp-localhost-install
- run the development instance
make webapp-localhost-dev
- go to localhost:3000 to see your local webapp instance
- auto-reload: when you change any files in the
apps/webapp
subdirectory, thelocalhost:3000
will automatically reload - install commands: you do not need to run
make install-nodejs
again, and you only need to runmake webapp-localhost-install
if dependencies change
once you start using a local environment, you won't be connected to the demo environment's inference instances. this subsection shows you how to run an inference instance locally so you can do things like steering, activation testing, etc on the sources/SAEs you've downloaded.
⚠️ warning: for the local environment, we only support running one inference server at a time. this is because you are unlikely to be running multiple models simultaneously on one machine, as they are memory and compute intensive.
-
ensure you have installed poetry
-
install the inference server's dependencies
make inference-localhost-install
-
build the image, picking the correct command based on if the machine has CUDA or not:
# CUDA make inference-localhost-build-gpu USE_LOCAL_HF_CACHE=1
# no CUDA make inference-localhost-build USE_LOCAL_HF_CACHE=1
➡️ The
USE_LOCAL_HF_CACHE=1
flag mounts your local HuggingFace cache at${HOME}/.cache/huggingface/hub:/root/.cache/huggingface/hub
. If you wish to create a new cache in your container instead, you can omit this flag here and in the next step. -
run the inference server, using the
MODEL_SOURCESET
argument to specify the.env.inference.[model_sourceset]
file you're loading from. for this example, we will rungpt2-small
, and load theres-jb
sourceset/SAE set, which is configured in the.env.inference.gpt2-small.res-jb
file. you can see the other pre-loaded inference configs or create your own config as well.# CUDA make inference-localhost-dev-gpu \ MODEL_SOURCESET=gpt2-small.res-jb \ USE_LOCAL_HF_CACHE=1 # no CUDA make inference-localhost-dev \ MODEL_SOURCESET=gpt2-small.res-jb \ USE_LOCAL_HF_CACHE=1
-
wait for it to load (first time will take longer). when you see
Initialized: True
, the local inference server is now ready onlocalhost:5002
to interact with the inference server, you have a few options - note that this will only work for the model / selected source you have loaded:
- load the webapp with the local database setup, then using the model / selected source as you would normally do on neuronpedia.
- use the pre-generated inference python client at
packages/python/neuronpedia-inference-client
(set environment variableINFERENCE_SERVER_SECRET
topublic
, or whatever it's set to in.env.localhost
if you've changed it) - use the openapi spec, located at
schemas/openapi/inference-server.yaml
to make calls with any client of your choice. - TODO #1: Use a documentation generator to make a simple tester-server that can be activated with
make doc-inference-localhost
we've provided some pre-loaded inference configs as examples of how to load a specific model and sourceset for inference. view them by running make inference-list-configs
:
$ make inference-list-configs
Available Inference Configurations (.env.inference.*)
================================================
deepseek-r1-distill-llama-8b.llamascope-slimpj-res-32k
Model: meta-llama/Llama-3.1-8B
Source/SAE Sets: '["llamascope-slimpj-res-32k"]'
make inference-localhost-dev MODEL_SOURCESET=deepseek-r1-distill-llama-8b.llamascope-slimpj-res-32k
gemma-2-2b-it.gemmascope-res-16k
Model: gemma-2-2b-it
Source/SAE Sets: '["gemmascope-res-16k"]'
make inference-localhost-dev MODEL_SOURCESET=gemma-2-2b-it.gemmascope-res-16k
gpt2-small.res-jb
Model: gpt2-small
Source/SAE Sets: '["res-jb"]'
make inference-localhost-dev MODEL_SOURCESET=gpt2-small.res-jb
look at the .env.inference.*
files for examples on how to make these inference server configurations.
the MODEL_ID
is the model id from the transformerlens model table and each of SAE_SETS
is the text after the layer number and hyphen in a neuronpedia source ID - for example, if you have a neuronpedia feature at url http://neuronpedia.org/gpt2-small/0-res-jb/123
, the 0-res-jb
is the source ID, and the item in the SAE_SETS
is res-jb
. This example matches the .env.inference.gpt2-small.res-jb
file exactly.
you can find neuronpedia source IDs in the saelens pretrained saes yaml file or by clicking into models in the neuronpedia datasets exports directory.
using models not officially supported by transformerlens
look at the .env.inference.deepseek-r1-distill-llama-8b.llamascope-slimpj-res-32k
to see an example of how to load a model not officially supported by transformerlens. this is mostly for swapping in weights of a distilled/fine-tuned model.
loading non-saelens sources/SAEs
- TODO #2 document how to load SAEs/sources that are not in saelens pretrained yaml
- schema-driven development: to add new endpoints or change existing endpoints, you will need to start by updating the openapi schemas, then generating clients from that, then finally updating the actual inference and webapp code. for details on how to do this, see the openapi readme: making changes to the inference server
- no auto-reload: when you change any files in the
apps/inference
subdirectory, the inference server will NOT automatically reload, because server reloads are slow: they reload the model and all sources/SAEs. if you want to enable autoreload, then appendAUTORELOAD=1
to themake inference-localhost-dev
call, like so:make inference-localhost-dev \ MODEL_SOURCESET=gpt2-small.res-jb \ AUTORELOAD=1
this section is under construction.
- check out the autointerp readme
- TODO instructions for setting up autointerp server locally
- TODO - look at the
autointerp
service in docker-compose.yaml - schema-driven development: openapi readme: making changes to the autointerp server
this section is under construction.
- use EleutherAI's Delphi library
- for OpenAI's autointerp, use utils/neuronpedia_utils/batch-autointerp.py
this section is under construction.
TODO: simplify generation + upload of data to neuronpedia
TODO: neuronpedia-utils should use poetry
in this example, we will generate dashboards/data for an SAELens-compatible SAE, and upload it to our own Neuronpedia instance.
-
ensure you have Poetry installed
-
upload your SAELens-compatible source/SAE to HuggingFace.
Example ➡️ https://huggingface.co/chanind/gemma-2-2b-batch-topk-matryoshka-saes-w-32k-l0-40
-
clone SAELens locally.
git clone https://github.com/jbloomAus/SAELens.git
-
open your cloned SAELens and edit the file
sae_lens/pretrained_saes.yaml
. add a new entry at the bottom, based on the template below (see comments for how to fill it out):Example ➡️ https://github.com/jbloomAus/SAELens/pull/455/files
gemma-2-2b-res-matryoshka-dc: # a unique ID for your set of SAEs conversion_func: null # null if your SAE config is already compatible with SAELens links: # optional links model: https://huggingface.co/google/gemma-2-2b model: gemma-2-2b # transformerlens model id - https://transformerlensorg.github.io/TransformerLens/generated/model_properties_table.html repo_id: chanind/gemma-2-2b-batch-topk-matryoshka-saes-w-32k-l0-40 # the huggingface repo path saes: - id: blocks.0.hook_resid_post # an id for this SAE path: standard/blocks.0.hook_resid_post # the path in the repo_id to the SAE l0: 40.0 neuronpedia: gemma-2-2b/0-matryoshka-res-dc # what you expect the neuronpedia URI to be - neuronpedia.org/[this_slug]. should be [model_id]/[layer]-[identical_slug_for_this_sae_set] - id: blocks.1.hook_resid_post # more SAEs in this SAE set path: standard/blocks.1.hook_resid_post l0: 40.0 neuronpedia: gemma-2-2b/1-matryoshka-res-dc # note that this is identical to the entry above, except 1 instead of 0 for the layer - [...]
-
clone SAEDashboard locally.
git clone https://github.com/jbloomAus/SAEDashboard.git
-
configure your cloned
SAEDashboard
to use your cloned modifiedSAELens
, instead of the one in productioncd SAEDashboard # set directory poetry lock && poetry install # install dependencies poetry remove sae-lens # remove production dependency poetry add PATH/TO/CLONED/SAELENS # set local dependency
-
generate dashboards for the SAE. this will take from 30 min to a few hours, depending on your hardware and size of model.
cd SAEDashboard # set directory rm -rf cached_activations # clear old cached data # start the generation. details for each argument (full details: https://github.com/jbloomAus/SAEDashboard/blob/main/sae_dashboard/neuronpedia/neuronpedia_runner_config.py) # - sae-set = should match the unique ID for the set from pretrained_saes.yaml # - sae-path = should match the id for the sae in from pretrained_saes.yaml # - np-set-name = should match the [identical_slug_for_this_sae_set] for the sae.neuronpedia from pretrained_saes.yaml # - dataset-path = the huggingface dataset to use for generating activations. usually you want to use the same dataset the model was trained on. # - output-dir = the output directory of the dashboard data # - n-prompts = number of activation texts to test from the dataset # - n-tokens-in-prompt, n-features-per-batch, n-prompts-in-forward-pass = keep these at 128 poetry run neuronpedia-runner \ --sae-set="gemma-2-2b-res-matryoshka-dc" \ --sae-path="blocks.12.hook_resid_post" \ --np-set-name="matryoshka-res-dc" \ --dataset-path="monology/pile-uncopyrighted" \ --output-dir="neuronpedia_outputs/" \ --sae_dtype="float32" \ --model_dtype="bfloat16" \ --sparsity-threshold=1 \ --n-prompts=24576 \ --n-tokens-in-prompt=128 \ --n-features-per-batch=128 \ --n-prompts-in-forward-pass=128
-
convert these dashboards for import into neuronpedia
cd neuronpedia/utils/neuronpedia-utils # get into this current repository's util directory python convert-saedashboard-to-neuronpedia.py # start guided conversion script. follow the steps.
-
once dashboard files are generated for neuronpedia, upload these to the global Neuronpedia S3 bucket - currently you need to contact us to do this.
-
from a localhost instance, import your data
here's how the services/scripts connect in neuronpedia. it's easiest to read this diagram by starting at the image of the laptop ("User").
you can run neuronpedia on any cloud and on any modern OS. neuronpedia is designed to avoid vendor lock-in. these instructions were written for and tested on macos 15 (sequoia), so you may need to repurpose commands for windows/ubuntu/etc. at least 16GB ram is recommended.
name | description | powered by |
---|---|---|
webapp | serves the neuronpedia.org frontend and the api | next.js / react |
database | stores features, activations, explanations, users, lists, etc | postgres |
inference | [support server] steering, activation testing, search via inference, topk, etc. a separate instance is required for each model you want to run inference on. | python / torch |
autointerp | [support server] auto-interp explanations and scoring, using eleutherAI's delphi (formerly sae-auto-interp ) |
python |
by design, each service can be run independently as a standalone app. this is to enable extensibility and forkability.
for example, if you like the neuronpedia webapp frontend but want to use a different API for inference, you can do that! just ensure your alternative inference server supports the schema/openapi/inference-server.yaml
spec, and/or that you modify the neuronpedia calls to inference under apps/webapp/lib/utils
.
there are draft README
s for each specific app/service under apps/[service]
, but they are heavily WIP. you can also check out the Dockerfile
under the same directory to build your own images.
for services to communicate with each other in a typed and consistent way, we use openapi schemas. there are some exceptions - for example, streaming is not offically supported by the openapi spec. however, even in that case, we still try our best to define a schema and use it.
especially for inference and autointerp server development, it is critical to understand and use the instructions under the openapi readme.
openapi schemas are located under /schemas
. we use openapi generators to generate clients in both typescript and python.
apps
- the three neuronpedia services: webapp, inference, and autointerp. most of the code is here.
schemas
- the openapi schemas. to make changes to inference and autointerp endpoints, first make changes to their schemas - see details in the openapi readme.
packages
- clients generated from the schemas
using generator tools. you will mostly not need to manually modify these files.
utils
- various utilities that help do offline processing, like high volume autointerp, or generating dashboards, or exporting data.
please report vulnerabilities to [email protected].
we don't currently have an official bounty program, but we'll try our best to give compensation based on the severity of the vulnerability - though it's likely we will not able able to offer awards for any low-severity vulnerabilities.
- slack: join #neuronpedia
- email: [email protected]
- issues: github issues
See CONTRIBUTING.md.
you can view all available make
commands and brief descriptions of them by running make help
if you set up your own database, it will start out empty - no features, explanations, activations, etc. to load this data, there's a built-in admin panel
where you can download this data for SAEs (or "sources") of your choosing.
⚠️ warning: the admin panel is finicky and does not currently support resuming imports. if an import is interrupted, you must manually clickre-sync
. the admin panel currently does not check if your download is complete or missing parts - it is up to you to check if the data is complete, and if not, to clickre-sync
to re-download the entire dataset.
ℹ️ recommendation: when importing data, start with just one source (like
gpt2-small
@10-res-jb
) instead of downloading everything at once. This makes it easier to verify the data imported correctly and lets you start using neuronpedia faster.
the instructions below demonstrate how to download the gpt2-small
@10-res-jb
SAE data.
- navigate to localhost:3000/admin.
- scroll down to
gpt2-small
, and expandres-jb
with the▶
. - click
Download
next to10-res-jb
. - wait patiently - this can be a LOT of data, and depending on your connection/cpu speed it can take up to 30 minutes or an hour.
- once it's done, click
Browse
or use the navbar to try it out:Jump To
/Search
/Steer
. - repeat for other SAE/source data you wish to download.
in the webapp, the search explanations
feature requires you to set an OPENAI_API_KEY
. otherwise you will get no search results.
this is because the search explanations
functionality searches for features by semantic similarity. if you search cat
, it will also return feline
, tabby
, animal
, etc. to do this, it needs to calculate the embedding for your input cat
. we use openai's embedding api (specifically, text-embedding-3-large
with dimension: 256
) to calculate the embeddings.