tomdlt10 Profile Banner
Tom Dupré la Tour Profile
Tom Dupré la Tour

@tomdlt10

Followers
551
Following
286
Media
16
Statuses
131

LLM interpretability @openai, previously neuron and fMRI interpretability @gallantlab, neurophysiology @agramfort, machine-learning for @scikit_learn

San Francisco, CA
Joined December 2014
Don't wanna be here? Send us removal request.
@tomdlt10
Tom Dupré la Tour
18 days
RT @MilesKWang: We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We f….
0
424
0
@tomdlt10
Tom Dupré la Tour
1 year
RT @nabla_theta: Excited to share what I've been working on as part of the former Superalignment team!. We introduce a SOTA training stack….
0
85
0
@tomdlt10
Tom Dupré la Tour
1 year
RT @cathychen23: Do brain representations of language depend on whether the inputs are pixels or sounds?. Our @CommsBio paper studies this….
0
39
0
@tomdlt10
Tom Dupré la Tour
1 year
RT @janleike: Today we're releasing a tool we've been using internally to analyze transformer internals - the Transformer Debugger!. It com….
0
189
0
@tomdlt10
Tom Dupré la Tour
2 years
RT @davederiso: Made an ultra fast DTW solver w Stephen Boyd @StanfordEng 🚀. Contributions:.1. Linear time.2. Continuous time formulation.3….
0
24
0
@tomdlt10
Tom Dupré la Tour
3 years
RT @tomamoral: 📢NEW PAPER ALERT📢. Benchopt: Reproducible, efficient and collaborative optimization benchmarks 🎉. I….
0
28
0
@tomdlt10
Tom Dupré la Tour
3 years
And for a practical introduction to voxelwise encoding models, check out our tutorials! 9/9.
0
0
1
@tomdlt10
Tom Dupré la Tour
3 years
We also derive efficient methods for solving banded ridge regression on large number of voxels, and leverage computations on GPU for an additional boost. Check out our Python package "himalaya"! 8/9.
1
0
0
@tomdlt10
Tom Dupré la Tour
3 years
Moreover, there is no reason that selecting only one layer is optimal. Instead, we fit a joint model on all layers simultaneously, using banded ridge regression to automatically select the optimal layer selection. It leads to a smoother gradient over the cortical surface. 7/9
Tweet media one
2
0
2
@tomdlt10
Tom Dupré la Tour
3 years
For example, let's consider feature spaces extracted from intermediate CNN layers. Selecting the best layer per voxel leads to a gradient over the cortical surface. But because all layers have similar predictive power, the best-layer selection is not robust. 6/9
Tweet media one
2
0
1
@tomdlt10
Tom Dupré la Tour
3 years
Interestingly, banded ridge regression contains an implicit feature-space selection mechanism, to effectively ignore non-predictive or redundant feature spaces. This mechanism is useful to automatically select the relevant feature spaces in each voxel. 5/9
Tweet media one
1
0
1
@tomdlt10
Tom Dupré la Tour
3 years
Encoding models are usually fit with ridge regression. In a joint model with multiple feature spaces, ridge regression is naturally extended to "banded ridge regression", to optimize a different regularization hyperparameter per feature space. 4/9
Tweet media one
1
0
0
@tomdlt10
Tom Dupré la Tour
3 years
To account for a potential complementarity of different feature spaces, a joint model can be fit on multiple feature spaces simultaneously. Then a variance decomposition method is used to decompose the variance explained into separate contributions from each feature space. 3/9
Tweet media one
1
0
0
@tomdlt10
Tom Dupré la Tour
3 years
Each feature space corresponds to a hypothesis about the information encoded in brain activity. For example when watching a movie, some brain areas can be predicted from objects present in the scene, and other areas from amount of motion in each part of the screen. 2/9
Tweet media one
Tweet media two
1
0
0
@tomdlt10
Tom Dupré la Tour
3 years
Encoding models provide a powerful framework to identify the information represented in brain activity. In this framework, a stimulus (or task) representation is expressed as a feature space and is used in a regularized linear regression to predict brain activity. 1/9
Tweet media one
1
0
1
@tomdlt10
Tom Dupré la Tour
3 years
New preprint (with @meickenberg and @gallantlab): Feature-space selection with banded ridge regression.Choose your preferred thread depending on your interest and technical background:.- Neuroimaging 🧵 below.- ML 🧵 at
1
5
12
@tomdlt10
Tom Dupré la Tour
3 years
Check out our "himalaya" package with CPU and GPU support ! 8/8.
0
1
0
@tomdlt10
Tom Dupré la Tour
3 years
But how do we solve banded ridge regression efficiently? Using either random search over a Dirichlet distribution, or hyperparameter gradient descent through implicit differentiation. 7/8
Tweet media one
1
0
1
@tomdlt10
Tom Dupré la Tour
3 years
Interestingly, multiple-kernel ridge regression has been shown to be equivalent to the (squared) group lasso, a model well known for inducing group sparsity. (see . So banded ridge regression is (almost) equivalent to the (squared) group lasso. 6/8.
1
0
0
@tomdlt10
Tom Dupré la Tour
3 years
Banded ridge regression can also be formulated with kernels, using a separate linear kernel per group of features. This formulation is called "multiple-kernel ridge regression". 5/8
Tweet media one
1
0
0