Tom Dupré la Tour @tomdlt10 X Profile

Tom Dupré la Tour

@tomdlt10

Followers

551

Following

286

Media

16

Statuses

131

LLM interpretability @openai, previously neuron and fMRI interpretability @gallantlab, neurophysiology @agramfort, machine-learning for @scikit_learn

San Francisco, CA

Joined December 2014

Don't wanna be here? Send us removal request.

Tom Dupré la Tour

@tomdlt10

18 days

RT @MilesKWang: We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We f….

0

424

0

Tom Dupré la Tour

@tomdlt10

1 year

RT @nabla_theta: Excited to share what I've been working on as part of the former Superalignment team!. We introduce a SOTA training stack….

0

85

0

Tom Dupré la Tour

@tomdlt10

1 year

RT @cathychen23: Do brain representations of language depend on whether the inputs are pixels or sounds?. Our @CommsBio paper studies this….

0

39

0

Tom Dupré la Tour

@tomdlt10

1 year

RT @janleike: Today we're releasing a tool we've been using internally to analyze transformer internals - the Transformer Debugger!. It com….

0

189

0

Tom Dupré la Tour

@tomdlt10

2 years

RT @davederiso: Made an ultra fast DTW solver w Stephen Boyd @StanfordEng 🚀. Contributions:.1. Linear time.2. Continuous time formulation.3….

0

24

0

Tom Dupré la Tour

@tomdlt10

3 years

RT @tomamoral: 📢NEW PAPER ALERT📢. Benchopt: Reproducible, efficient and collaborative optimization benchmarks 🎉. I….

0

28

0

Tom Dupré la Tour

@tomdlt10

3 years

And for a practical introduction to voxelwise encoding models, check out our tutorials! 9/9.

0

1

Tom Dupré la Tour

@tomdlt10

3 years

We also derive efficient methods for solving banded ridge regression on large number of voxels, and leverage computations on GPU for an additional boost. Check out our Python package "himalaya"! 8/9.

1

0

Tom Dupré la Tour

@tomdlt10

3 years

Moreover, there is no reason that selecting only one layer is optimal. Instead, we fit a joint model on all layers simultaneously, using banded ridge regression to automatically select the optimal layer selection. It leads to a smoother gradient over the cortical surface. 7/9

2

0

2

Tom Dupré la Tour

@tomdlt10

3 years

For example, let's consider feature spaces extracted from intermediate CNN layers. Selecting the best layer per voxel leads to a gradient over the cortical surface. But because all layers have similar predictive power, the best-layer selection is not robust. 6/9

2

0

1

Tom Dupré la Tour

@tomdlt10

3 years

Interestingly, banded ridge regression contains an implicit feature-space selection mechanism, to effectively ignore non-predictive or redundant feature spaces. This mechanism is useful to automatically select the relevant feature spaces in each voxel. 5/9

1

0

1

Tom Dupré la Tour

@tomdlt10

3 years

Encoding models are usually fit with ridge regression. In a joint model with multiple feature spaces, ridge regression is naturally extended to "banded ridge regression", to optimize a different regularization hyperparameter per feature space. 4/9

1

0

Tom Dupré la Tour

@tomdlt10

3 years

To account for a potential complementarity of different feature spaces, a joint model can be fit on multiple feature spaces simultaneously. Then a variance decomposition method is used to decompose the variance explained into separate contributions from each feature space. 3/9

1

0

Tom Dupré la Tour

@tomdlt10

3 years

Each feature space corresponds to a hypothesis about the information encoded in brain activity. For example when watching a movie, some brain areas can be predicted from objects present in the scene, and other areas from amount of motion in each part of the screen. 2/9

1

0

Tom Dupré la Tour

@tomdlt10

3 years

Encoding models provide a powerful framework to identify the information represented in brain activity. In this framework, a stimulus (or task) representation is expressed as a feature space and is used in a regularized linear regression to predict brain activity. 1/9

1

0

1

Tom Dupré la Tour

@tomdlt10

3 years

New preprint (with @meickenberg and @gallantlab): Feature-space selection with banded ridge regression.Choose your preferred thread depending on your interest and technical background:.- Neuroimaging 🧵 below.- ML 🧵 at

1

5

12

Tom Dupré la Tour

@tomdlt10

3 years

Check out our "himalaya" package with CPU and GPU support ! 8/8.

0

1

0

Tom Dupré la Tour

@tomdlt10

3 years

But how do we solve banded ridge regression efficiently? Using either random search over a Dirichlet distribution, or hyperparameter gradient descent through implicit differentiation. 7/8

1

0

1

Tom Dupré la Tour

@tomdlt10

3 years

Interestingly, multiple-kernel ridge regression has been shown to be equivalent to the (squared) group lasso, a model well known for inducing group sparsity. (see . So banded ridge regression is (almost) equivalent to the (squared) group lasso. 6/8.

1

0

Tom Dupré la Tour

@tomdlt10

3 years

Banded ridge regression can also be formulated with kernels, using a separate linear kernel per group of features. This formulation is called "multiple-kernel ridge regression". 5/8

1

0