Runa Eschenhagen Profile
Runa Eschenhagen

@runame_

Followers
525
Following
522
Media
8
Statuses
198

PhD student in machine learning @CambridgeMLG and external research collaborator @AIatMeta (MSL).

Joined October 2021
Don't wanna be here? Send us removal request.
@runame_
Runa Eschenhagen
10 months
1/7 Still using Adam? If anyone wants to try a distributed PyTorch implementation of SOAP/eigenvalue-corrected Shampoo with support for low precision data types instead, here you go. https://t.co/I17zvWhcsb
Tweet card summary image
github.com
For optimization algorithm research and development. - facebookresearch/optimizers
5
87
772
@jxbz
Jeremy Bernstein
5 days
Excited about our new research blog!
@thinkymachines
Thinking Machines
5 days
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
Tweet media one
7
12
296
@algoperf
AlgoPerf
7 days
We just released AlgoPerf v0.6! 🎉 ✅ Rolling leaderboard ✅ Lower compute costs ✅ JAX jit migration ✅ Bug fixes & flexible API Coming soon: More contemporary baselines + an LM workload… https://t.co/QBOqGvqNWG
github.com
MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvements in both training algorithms and models. - mlcommons/algori...
0
9
43
@MatharyCharles
Zachary Charles
11 days
I would love to see this done, but with the kind of evaluation methodology from the Algo Perf work. Though I guess one of the key takeaways from these two new works is that doing this in a "scaling aware" way requires new methodology.
Tweet media one
@AndreiSemenov17
Andrei Semenov
12 days
Amazing "competing" work from @wen_kaiyue @tengyuma @percyliang There are some good stories about optimizers to tell this week 😃 https://t.co/z0K0kG90mW https://t.co/KziMZlzwGj
Tweet media one
1
2
15
@AndreiSemenov17
Andrei Semenov
12 days
Amazing "competing" work from @wen_kaiyue @tengyuma @percyliang There are some good stories about optimizers to tell this week 😃 https://t.co/z0K0kG90mW https://t.co/KziMZlzwGj
Tweet media one
4
31
208
@wen_kaiyue
Kaiyue Wen
11 days
(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
Tweet media one
12
88
418
@tmpethick
Thomas Pethick
27 days
I've noticed there is some confusion about Dion since it mathematically looks so different from Muon and Spectral descent, so I wrote a small note expressing Dion in terms of the SVD and how it differs from PowerSGD 👇
1
2
4
@f_dangel
Felix Dangel
1 month
KFAC is everywhere—from optimization to influence functions. While the intuition is simple, implementation is tricky. We (@BalintMucsanyi, @2bys2 ,@runame_) wrote a ground-up intro with code to help you get it right. 📖 https://t.co/sIQfB1bmsE 💻
Tweet card summary image
github.com
KFAC from scratch (KFS)---Paper & Code. Contribute to f-dangel/kfac-tutorial development by creating an account on GitHub.
0
9
39
@HessianFree
HessianFree
1 month
Wait what how did I miss this one @aaron_defazio at it again
Tweet media one
1
13
137
@_arohan_
rohan anil
1 month
Optimal regularization with optimizers.
Tweet media one
2
4
34
@tychovdo
Tycho van der Ouderaa
2 months
This past spring, I spent time with the @exolabs team to work on a new DL optimizer and wiring up clusters of Macs for distributed TRAINING on Apple Silicon. If you’re at ICML, be sure to come by the @ESFoMo workshop (posters 1-2:30pm) this Saturday. I’ll be there to share some
Tweet media one
Tweet media two
@MattBeton
Matt Beton
2 months
I’m going to be in Vancouver next week for ICML! Would love to meet anyone involved with distributed training, infrastructure, inference engines, open source AI. I'll be presenting two papers: - EXO Gym - an open source framework for simulating distributed training algorithms
Tweet media one
Tweet media two
4
13
115
@frankstefansch1
Frank Schneider
2 months
At #ICML2025 and don't know which workshop to join? Why not come and celebrate/rant about open source ML with us? We got amazing speakers (@tri_dao is just one example)! Come by West Meeting Room 211-214 👋
Tweet media one
0
1
6
@JihaoAndreasLin
Jihao Andreas Lin
2 months
Excited to share our ICML 2025 paper: "Scalable Gaussian Processes with Latent Kronecker Structure" We unlock efficient linear algebra for your kernel matrix which *almost* has Kronecker product structure. Check out our paper here: https://t.co/wqq89CTrAb
Tweet card summary image
arxiv.org
Applying Gaussian processes (GPs) to very large datasets remains a challenge due to limited computational scalability. Matrix structures, such as the Kronecker product, can accelerate operations...
1
9
22
@ThomasTCKZhang
Thomas Zhang
2 months
I’ll be presenting our paper “On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning” at ICML during the Tuesday 11am poster session! DL opt is seeing a renaissance 🦾; what can we say from a NN feature learning perspective? 1/8
2
9
64
@kayembruno
Bruno Mlodozeniec
2 months
You don't need bespoke tools for causal inference. Probabilistic modelling is enough. I'll be making this case (and dodging pitchforks) at our ICML oral presentation tomorrow.
Tweet media one
1
4
15
@tmpethick
Thomas Pethick
2 months
When comparing optimization methods, we often change *multiple things at once*—geometry, normalization, etc.—possibly without realizing it. Let's disentangle these changes. 👇
1
4
6
@akristiadi7
Agustinus Kristiadi
2 months
📢 [Openings] I'm now an Assistant Prof @WesternU CS dept. Funded PhD & MSc positions available! Topics: large probabilistic models, decision-making under uncertainty, and apps in AI4Science. More on https://t.co/h8R8VpDN83
Tweet media one
1
11
27
@joost_v_amersf
Joost van Amersfoort
3 months
Never will be.
@eliebakouch
elie
3 months
Pre-training is not dead
Tweet media one
5
13
139
@MarkSchmidtUBC
Mark Schmidt
3 months
My former PhD student Fred Kunstner has been awarded the @c_a_i_a_c Best Doctoral Dissertation Award: https://t.co/R6Wdl0FtIu His thesis on machine learning algorithms includes an EM proof "from the book", why Adam works, and the first provably-faster hyper-gradient method.
Tweet media one
3
23
238
@aaron_defazio
Aaron Defazio
3 months
Why do gradients increase near the end of training? Read the paper to find out! We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training. https://t.co/t5gxzV9CrZ
Tweet media one
13
75
548
@Ar_Douillard
Arthur Douillard
3 months
duality of humanity
Tweet media one
Tweet media two
2
1
18