James Golden
@James_R_Golden
Followers
78
Following
376
Media
10
Statuses
38
Arcadia Science
Joined June 2025
How to transform UMAP from a black box into a glass box: by using a special type of deep network, we can now compute exact linear equivalents that reveal which features drive each point's position in the embedding. @ArcadiaScience [1/8]
2
40
220
Limitations: This is "pointwise" linear, not piecewise. The Jacobian is only valid at that exact input, move slightly in embedding space and it changes completely. Working on an efficient Lanczos method for long inputs. Code:
github.com
Equivalent Linear Mappings of Large Language Models - jamesgolden1/equivalent-linear-LLMs
0
0
0
Practical application: these Jacobian matrices work as steering operators. The detached Jacobian from "The Golden Gate" can be used to steer unrelated prompts toward that concept. The output is "Golden Gate Bridge" insertions into random text
1
0
0
The singular value spectra of the linear operators are extremely low rank. The right singular vectors map to input tokens, the left singular vectors decode back to output tokens. For "The bridge out of Marin is the", the left SVs decode to "bridge", "Golden", "highway", "exit"
1
0
0
This is like the approach used in the Elhage Nanda Olsson et al. "A Mathematical Framework for Transformer Circuits" for attention-only models, expanded to MLP and normalization blocks, using gradient detachment and the autograd Jacobian to compute the equivalent linear system.
1
0
0
Every operation in transformer decoders (attention, gated activations, normalization) can be written as A(x)·x, where A(x) is input-dependent. By "detaching" the gradient at the right places during inference, you freeze A(x) and get a pure linear system.
arxiv.org
Despite significant progress in transformer interpretability, an understanding of the computational mechanisms of large language models (LLMs) remains a fundamental challenge. Many approaches...
1
0
0
Paper in TMLR and poster at the NeurIPS Mechanistic Interpretability workshop: "Equivalent Linear Mappings of Large Language Models" LLMs like Gemma 3 12B can be mapped to an equivalent, interpretable linear system for any given input, with output embed recon error of 10^(-13)
1
0
2
🧵Really excited to share a set of our recent pubs @ArcadiaScience where we make black box BioML models transparent. [1/7] https://t.co/IIo2ZePAvd
research.arcadiascience.com
Deep networks make accurate predictions, but their nonlinearity makes them a black box, hiding what they have learned. Here, we look inside the black box and analyze the exact relationships they...
1
6
10
Check out our latest work at @ArcadiaScience on decomposing a neural network trained on Genotype-Phenotype data into interpretable and familiar quantitative genetics parameters.
research.arcadiascience.com
We tested equivalent linear mapping (ELM) on a neural network trained to predict phenotypes from genotypes in simulated data. We show that ELM successfully recapitulates additive and epistatic...
1
6
9
@ArcadiaScience Note on the GIF: this is a slightly different encoder-decoder network where the loss function has terms for graph BCE, reconstruction and enc-dec-enc cyclic recon. The lattice is generated from the fully trained decoder and then encoded with each saved checkpoint for the GIF
0
0
3
@ArcadiaScience This approach works beyond genomics too: anywhere you use UMAP (images, protein embeddings, etc.), you can now extract exact feature attributions. Check out the notebook pub to try it [8/8]
1
1
7
@ArcadiaScience Comparing glass-box UMAP features to differential expression shows they're complementary, not identical. Many differentially expressed genes aren't the ones UMAP actually uses to separate clusters, as glass-box networks reveal what UMAP truly learned. [7/8]
1
2
7
@ArcadiaScience Coloring cells by their top gene contributor reveals structure that cell type labels alone don't show. Some top gene contributors extend across traditional cell type boundaries, showing how the embedding space is composed. [6/8]
1
0
4
@ArcadiaScience For the Luecken et al. human bone marrow gene expression dataset, we can now see exactly which genes contribute most to each cell's position. Some cell types have sub-regions driven by different genes — like Normoblasts, where HBD dominates one region and HBB another. [5/8]
1
1
9
@ArcadiaScience We validate that the Jacobian perfectly reconstructs the embedding as the reconstruction error approaches machine precision (~3e-14). This isn't an approximation like SHAP or LIME; it's the exact feature contribution. [4/8]
1
0
5
@ArcadiaScience With parametric UMAP, we can use certain deep networks (with zero-bias linear layers and ReLU activations) which are locally linear at each point. This means we can compute a Jacobian set of linear weights that exactly reconstruct the output for every data point [3/8]
0
0
1
@ArcadiaScience UMAP is everywhere because it's great at creating visually distinct clusters from high-dimensional data like gene expression. But there's a catch: the nonlinear mapping makes it hard to interpret which features are responsible for those clusters. https://t.co/IE1pSB6enH [2/8]
arcadia-science.github.io
2
5
17
Equivalent Linear Mappings of Large Language Models James Robert Golden. Action editor: Shay Cohen. https://t.co/wgS9QtmhGT
#decoders #representations #transform
openreview.net
Despite significant progress in transformer interpretability, an understanding of the computational mechanisms of large language models (LLMs) remains a fundamental challenge. Many approaches...
0
1
1
Since this also works for Gemma 2, I plan to compare its equivalent linear representations to Gemma Scope SAE latents. My hope is that this could complement widely used approaches like SAEs and linear probes. I will have a poster on this at the NeurIPS mech interp workshop
0
0
0
The tradeoffs: computing the full Jacobian takes about 20 seconds per sequence for Qwen 14B, and most results are on short sequences (<10 tokens). I have a JAX Lanczos implementation for computing top-k singular vectors on Gemma 3 4B up to 400 tokens, which is more practical.
1
0
0