Isaac Reid
@isaac_o_reid
Followers
109
Following
70
Media
3
Statuses
26
Machine learning PhD @CambridgeMLG 🤖 || Recovering theoretical physicist @UniOfOxford 🔭|| Keeping it analytic @TrinCollCam
Joined November 2021
1/9 In practice, the Shampoo optimizer crucially relies on several heuristics. In our NeurIPS 2025 spotlight paper, we investigate the role of learning rate grafting and infrequent preconditioner updates in Shampoo by decomposing its preconditioner. https://t.co/TfI1gwMrFs
3
20
91
9/9 Credit as always to machine learning weapon and first coauthor @kayembruno, as well as crack research team @sp_monte_carlo, @RogerGrosse, @MuratAErdogdu, @DavidSKrueger and Rich Turner. This was a very fun project and I'm proud of the outcome!
1
0
4
8/9 Want to hear more? Check out our poster session at 11am on Friday! They were very nice and gave us a spotlight.
1
0
0
7/9 In other words, influence functions are ✨secretly distributional✨. We think this helps explain how IFs work in deep learning, where loss functions unfortunately aren’t parabolas.
1
0
0
6/9 Surprisingly, IFs exactly *solve* some d-TDA problems in certain asymptotic limits – without any convexity assumptions.
1
0
0
5/9 After a lot of head scratching, maths, and experiments, we formalise our thoughts this year at #NeurIPS2025 with *distributional* training data attribution (d-TDA).
1
0
0
4/9 Frankly, it’s about time TDA methods stop treating training as deterministic. If the outcome of training is a random variable, we ask: how does its *distribution* depend upon the dataset?
1
0
1
3/9 Spoiler alert: the convexity assumption is bs for deep learning. Real life training is messy and noisy. So how do IFs work so well in deep learning in practice?
1
0
0
2/9 A popular tool to answer this is question is provided by *influence functions* (IFs) – an old(ish) tool from robust statistics, which assumes the loss function is convex 🥣 and approximates its sensitivity to each datapoint.
1
0
0
Suppose your model outputs something weird, and you want to understand where the heck in the training data it came from. This is called ‘training data attribution’ 🕵️♂️ 1/9
1
4
11
Excited about this one! Variance reduction in Monte Carlo is really a multi-marginal optimal transport problem, and treating it as such gives us tools to sample more efficiently in Euclidean and discrete spaces https://t.co/DwX01XIjZr
0
5
42
Come and find out 1) why your random walks shouldn't be i.i.d., and 2) how you can use them to construct random features for graph node kernels! @kchorolab and I will be at poster sessions 2 and 5 😎
#ICLR2024 Highlights from the CBL! THIS WEEK, members of the Cambridge Machine Learning Group will be showcasing their work at ICLR 2024! A 🧵on what to expect from our team...
0
0
8
Really enjoyed writing this piece with @torfjelde and @vdutor 🙌 Thanks @msalbergo @ValentinDeBort1 @JamesTThorn for your insightful feedback 👌
Along with @MathieuEmile and @vdutor we've cooked up a gentle introduction to flow matching, a recent method for efficiently training continuous normalizing flows (CNFs)! Hope you find it interesting! https://t.co/eemzkPaDtF 1/2
1
10
42
Quasi-Monte Carlo Graph Random Features was a v fun project investigating how correlations between random walkers can improve the efficiency of sampling on graphs. I’m not in New Orleans this time round but the excellent @kchorolab will be at PS 6!
arxiv.org
We present a novel mechanism to improve the accuracy of the recently-introduced class of graph random features (GRFs). Our method induces negative correlations between the lengths of the...
We're at #NeurIPS2023 and excited to share our work! Our group and collaborators will present 22 papers at the main conference, including an oral and 5 spotlights. Check out our @NeurIPSConf schedule below – we'd love to chat at the poster sessions!
0
0
9
Super stoked to be in Honolulu at #ICML2023 presenting our paper Simplex Random Features! Catch our poster (session 6) and our oral presentation (C1) on Thursday. Come to chat about quasi-Monte Carlo, scalable Transformers and the waves at Waikiki!
0
0
18
(Teaser: this is for a kernel defined on Euclidean space, but what is the analogue on a graph – the natural space to describe e.g. molecular structure or social networks?) 9/9
0
0
1
Big thanks to all my awesome coauthors and collaborators, especially @kchorolab at Google/Columbia and @adrian_weller here in Cambridge 🧑🎓8/9
1
0
1
We show that these theoretical guarantees translate to better performance in downstream tasks, e.g. simplex-ViTs enjoy a 0.5% improvement on ImageNet at essentially no extra cost 🎉7/9
1
0
1
We derive the best possible choice for these correlations among a broad class, finding that the random vectors should point to the vertices of a d-1-dimensional simplex embedded in d-dimensional space (fancy way of referring to the high-d generalisation of a triangle) 6/9
1
0
1