Isaac Reid @isaac_o_reid X Profile

Isaac Reid

@isaac_o_reid

Followers

109

Following

70

Media

3

Statuses

26

Machine learning PhD @CambridgeMLG 🤖 || Recovering theoretical physicist @UniOfOxford 🔭|| Keeping it analytic @TrinCollCam

https://t.co/yxd1WjKoLk

Joined November 2021

Don't wanna be here? Send us removal request.

Runa Eschenhagen

@runame_

28 days

1/9 In practice, the Shampoo optimizer crucially relies on several heuristics. In our NeurIPS 2025 spotlight paper, we investigate the role of learning rate grafting and infrequent preconditioner updates in Shampoo by decomposing its preconditioner. https://t.co/TfI1gwMrFs

3

20

91

Isaac Reid

@isaac_o_reid

15 days

9/9 Credit as always to machine learning weapon and first coauthor @kayembruno, as well as crack research team @sp_monte_carlo, @RogerGrosse, @MuratAErdogdu, @DavidSKrueger and Rich Turner. This was a very fun project and I'm proud of the outcome!

1

0

4

Isaac Reid

@isaac_o_reid

15 days

8/9 Want to hear more? Check out our poster session at 11am on Friday! They were very nice and gave us a spotlight.

1

0

Isaac Reid

@isaac_o_reid

15 days

7/9 In other words, influence functions are ✨secretly distributional✨. We think this helps explain how IFs work in deep learning, where loss functions unfortunately aren’t parabolas.

1

0

Isaac Reid

@isaac_o_reid

15 days

6/9 Surprisingly, IFs exactly *solve* some d-TDA problems in certain asymptotic limits – without any convexity assumptions.

1

0

Isaac Reid

@isaac_o_reid

15 days

5/9 After a lot of head scratching, maths, and experiments, we formalise our thoughts this year at #NeurIPS2025 with *distributional* training data attribution (d-TDA).

1

0

Isaac Reid

@isaac_o_reid

15 days

4/9 Frankly, it’s about time TDA methods stop treating training as deterministic. If the outcome of training is a random variable, we ask: how does its *distribution* depend upon the dataset?

1

0

1

Isaac Reid

@isaac_o_reid

15 days

3/9 Spoiler alert: the convexity assumption is bs for deep learning. Real life training is messy and noisy. So how do IFs work so well in deep learning in practice?

1

0

Isaac Reid

@isaac_o_reid

15 days

2/9 A popular tool to answer this is question is provided by *influence functions* (IFs) – an old(ish) tool from robust statistics, which assumes the loss function is convex 🥣 and approximates its sensitivity to each datapoint.

1

0

Isaac Reid

@isaac_o_reid

15 days

Suppose your model outputs something weird, and you want to understand where the heck in the training data it came from. This is called ‘training data attribution’ 🕵️‍♂️ 1/9

1

4

11

Isaac Reid

@isaac_o_reid

2 years

Excited about this one! Variance reduction in Monte Carlo is really a multi-marginal optimal transport problem, and treating it as such gives us tools to sample more efficiently in Euclidean and discrete spaces https://t.co/DwX01XIjZr

0

5

42

Isaac Reid

@isaac_o_reid

2 years

Come and find out 1) why your random walks shouldn't be i.i.d., and 2) how you can use them to construct random features for graph node kernels! @kchorolab and I will be at poster sessions 2 and 5 😎

Cambridge MLG

@CambridgeMLG

2 years

#ICLR2024 Highlights from the CBL! THIS WEEK, members of the Cambridge Machine Learning Group will be showcasing their work at ICLR 2024! A 🧵on what to expect from our team...

0

8

Emile Mathieu

@MathieuEmile

2 years

Really enjoyed writing this piece with @torfjelde and @vdutor 🙌 Thanks @msalbergo @ValentinDeBort1 @JamesTThorn for your insightful feedback 👌

Tor Erlend Fjelde

@torfjelde

2 years

Along with @MathieuEmile and @vdutor we've cooked up a gentle introduction to flow matching, a recent method for efficiently training continuous normalizing flows (CNFs)! Hope you find it interesting! https://t.co/eemzkPaDtF 1/2

1

10

42

Isaac Reid

@isaac_o_reid

2 years

Quasi-Monte Carlo Graph Random Features was a v fun project investigating how correlations between random walkers can improve the efficiency of sampling on graphs. I’m not in New Orleans this time round but the excellent @kchorolab will be at PS 6!

arxiv.org

We present a novel mechanism to improve the accuracy of the recently-introduced class of graph random features (GRFs). Our method induces negative correlations between the lengths of the...

Cambridge MLG

@CambridgeMLG

2 years

We're at #NeurIPS2023 and excited to share our work! Our group and collaborators will present 22 papers at the main conference, including an oral and 5 spotlights. Check out our @NeurIPSConf schedule below – we'd love to chat at the poster sessions!

0

9

Isaac Reid

@isaac_o_reid

2 years

Super stoked to be in Honolulu at #ICML2023 presenting our paper Simplex Random Features! Catch our poster (session 6) and our oral presentation (C1) on Thursday. Come to chat about quasi-Monte Carlo, scalable Transformers and the waves at Waikiki!

0

18

Cambridge MLG

@CambridgeMLG

2 years

We're at #ICML2023 and excited to share our work! Our group and collaborators will be presenting 8 papers at the main conference and 5 papers at the workshops. Check out our @icmlconf schedule below, and please come talk to us and our fantastic collaborators :)

1

7

22

Isaac Reid

@isaac_o_reid

3 years

(Teaser: this is for a kernel defined on Euclidean space, but what is the analogue on a graph – the natural space to describe e.g. molecular structure or social networks?) 9/9

0

1

Isaac Reid

@isaac_o_reid

3 years

Big thanks to all my awesome coauthors and collaborators, especially @kchorolab at Google/Columbia and @adrian_weller here in Cambridge 🧑‍🎓8/9

1

0

1

Isaac Reid

@isaac_o_reid

3 years

We show that these theoretical guarantees translate to better performance in downstream tasks, e.g. simplex-ViTs enjoy a 0.5% improvement on ImageNet at essentially no extra cost 🎉7/9

1

0

1

Isaac Reid

@isaac_o_reid

3 years

We derive the best possible choice for these correlations among a broad class, finding that the random vectors should point to the vertices of a d-1-dimensional simplex embedded in d-dimensional space (fancy way of referring to the high-d generalisation of a triangle) 6/9

1

0

1