Runa Eschenhagen Profile
Runa Eschenhagen

@runame_

Followers
517
Following
485
Media
8
Statuses
186

PhD student in machine learning @CambridgeMLG and research scientist intern @AIatMeta.

Joined October 2021
Don't wanna be here? Send us removal request.
@runame_
Runa Eschenhagen
8 months
1/7 Still using Adam?. If anyone wants to try a distributed PyTorch implementation of SOAP/eigenvalue-corrected Shampoo with support for low precision data types instead, here you go.
5
85
771
@runame_
Runa Eschenhagen
2 days
RT @ThomasTCKZhang: I’ll be presenting our paper “On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning” a….
0
8
0
@runame_
Runa Eschenhagen
2 days
RT @kayembruno: You don't need bespoke tools for causal inference. Probabilistic modelling is enough. I'll be making this case (and dodgin….
0
4
0
@runame_
Runa Eschenhagen
10 days
RT @tmpethick: When comparing optimization methods, we often change *multiple things at once*—geometry, normalization, etc.—possibly withou….
0
2
0
@runame_
Runa Eschenhagen
11 days
RT @akristiadi7: 📢 [Openings] I'm now an Assistant Prof @WesternU CS dept. Funded PhD & MSc positions available! Topics: large probabilisti….
0
11
0
@runame_
Runa Eschenhagen
29 days
RT @joost_v_amersf: Never will be.
0
13
0
@runame_
Runa Eschenhagen
1 month
RT @MarkSchmidtUBC: My former PhD student Fred Kunstner has been awarded the @c_a_i_a_c Best Doctoral Dissertation Award:..
0
23
0
@runame_
Runa Eschenhagen
1 month
RT @aaron_defazio: Why do gradients increase near the end of training? .Read the paper to find out!.We also propose a simple fix to AdamW t….
0
74
0
@runame_
Runa Eschenhagen
1 month
RT @Ar_Douillard: duality of humanity
Tweet media one
Tweet media two
0
1
0
@runame_
Runa Eschenhagen
2 months
RT @orvieto_antonio: Adam is similar to many algorithms, but cannot be effectively replaced by any simpler variant in LMs. The community is….
0
44
0
@runame_
Runa Eschenhagen
2 months
RT @_katieeverett: 1. We often observe power laws between loss and compute: loss = a * flops ^ b + c.2. Models are rapidly becoming more ef….
0
92
0
@runame_
Runa Eschenhagen
2 months
RT @aaron_defazio: Write the paper you would want to read.
0
5
0
@runame_
Runa Eschenhagen
2 months
RT @roydanroy: This is a huge development. I want to highlight the theoreticians behind the scene, because this paper represents the reali….
0
52
0
@runame_
Runa Eschenhagen
2 months
RT @kayembruno: Great to be back from Singapore from #ICLR2025, and super excited to have given my first oral presentation on influence fun….
0
3
0
@runame_
Runa Eschenhagen
3 months
RT @JonathanWenger5: We have a fantastic lineup of speakers who have made deep contributions to open-source in ML, e.g. @sarahookr, @ChrisR….
0
5
0
@runame_
Runa Eschenhagen
3 months
RT @zhiyuanli_: Why does Adam outperform SGD in LLMs training? Adaptive step sizes alone don't fully explain this, as Adam also surpasses a….
0
35
0
@runame_
Runa Eschenhagen
3 months
RT @GuilleAngeris: @typedfemale literally is all you need.
0
2
0
@runame_
Runa Eschenhagen
3 months
RT @CambridgeMLG: All the MLG papers at #ICLR2025 main conference!
Tweet media one
0
6
0
@runame_
Runa Eschenhagen
3 months
RT @wormaniec: Ever wondered how the loss landscape of Transformers differs from that of other architectures? Or which Transformer componen….
0
8
0
@runame_
Runa Eschenhagen
3 months
RT @frankstefansch1: Tired of your open-source ML work not getting the academic recognition it deserves? 🤔 Submit to the first-ever CodeML….
0
2
0
@runame_
Runa Eschenhagen
3 months
RT @JonathanWenger5: Built a new ML library? Maintain a crucial project? Improved OSS practices? Your work deserves recognition! Submit you….
0
4
0