Lorenzo Noci Profile
Lorenzo Noci

@lorenzo_noci

Followers
436
Following
479
Media
13
Statuses
74

PhD in Machine Learning at @ETH working on deep learning theory and principled large-scale AI models.

Zurigo, Svizzera
Joined December 2013
Don't wanna be here? Send us removal request.
@lorenzo_noci
Lorenzo Noci
3 months
Pretraining large-depth transformers just got easier! . 🚀 HP transfer across model scale .⚡ Compute-efficient pretraining. Super cool collab with.@DeyNolan @BCZhang_ @mufan_li @CPehlevan @ShaneBergsma @BorisHanin Joel Hastness @CerebrasSystems.
@DeyNolan
Nolan Dey
3 months
(1/7) @CerebrasSystems Paper drop: TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧵 👇
Tweet media one
1
8
45
@lorenzo_noci
Lorenzo Noci
18 days
Pass by if you want to know about scaling up your model under distribution shifts of the training data. Take away: muP needs to be tuned to the optimal amount of feature learning that optimizes the forgetting/plasticity trade off.
@JGraldi
Jacopo Graldi
18 days
🚨 Excited to present our new paper at 🇨🇦 #ICML2025! 🚨. "The Importance of Being Lazy: Scaling Limits of Continual Learning". Great collab with @alebreccia99, @glanzillo11 , Thomas Hofmann, @lorenzo_noci. 🧵 1/6
Tweet media one
0
4
25
@lorenzo_noci
Lorenzo Noci
3 months
RT @AurelienLucchi: Our research group in the department of Mathematics and CS at the University of Basel (Switzerland) is looking for seve….
0
7
0
@lorenzo_noci
Lorenzo Noci
3 months
RT @albertobietti: Come hear about how transformers perform factual recall using associative memories, and how this emerges in phases durin….
Tweet card summary image
arxiv.org
Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate...
0
9
0
@lorenzo_noci
Lorenzo Noci
3 months
RT @elvisnavah: Come build with us and @OpenAI !!.
0
2
0
@lorenzo_noci
Lorenzo Noci
4 months
RT @LenaicChizat: Announcing : The 2nd International Summer School on Mathematical Aspects of Data Science.EPFL, Sept 1–5, 2025. Speakers:….
0
22
0
@lorenzo_noci
Lorenzo Noci
8 months
RT @blake__bordelon: Come by at Neurips to hear Hamza present about interesting properties of various feature learning infinite parameter l….
0
7
0
@lorenzo_noci
Lorenzo Noci
8 months
RT @bobby_he: Come by poster #2402 East hall at NeurIPS from 11am-2pm Friday to chat about why outlier features emerge during training and….
0
10
0
@lorenzo_noci
Lorenzo Noci
8 months
Excited to share a cool finding about neural network landscapes! We discovered that their curvature is scale-invariant throughout training, a property we call Super Consistency. This is linked to learning rate transfer! . Check out our NeurIPS poster #2403: Wed 11 Dec, 4:30 p.m📍
Tweet media one
1
16
122
@lorenzo_noci
Lorenzo Noci
8 months
Systematic empirical analysis of the role of feature learning in continual learning using scaling limits theory. Meet Jacopo in Vancouver :).
@JGraldi
Jacopo Graldi
8 months
🎉 Excited to be in #Vancouver next week for #NeurIPS to present results from my Master’s Thesis at the Scalable Continual Learning Workshop on December 14th!. 🚀 Our work investigates the role of scale and training regimes in Continual Learning. What did we find? 👇. 1/3.
0
0
5
@lorenzo_noci
Lorenzo Noci
8 months
Indeed very useful :).
@CPehlevan
Cengiz Pehlevan
8 months
We collected lecture notes and blog posts by group members about recent topics in deep learning theory here. Hope it is useful!.
0
0
6
@lorenzo_noci
Lorenzo Noci
9 months
RT @bobby_he: Updated camera ready New results include:. - non-diagonal preconditioners (SOAP/Shampoo) minimise OF….
0
31
0
@lorenzo_noci
Lorenzo Noci
1 year
RT @cjmaddison: I'm also recruiting PhD/MSc students this coming cycle, with an eye towards applications in drug discovery. .
0
9
0
@lorenzo_noci
Lorenzo Noci
1 year
RT @AurelienLucchi: My group has multiple openings both for PhD and Post-doc positions to work in the area of optimization for ML, and deep….
0
63
0
@lorenzo_noci
Lorenzo Noci
1 year
RT @bobby_he: Outlier Features (OFs) aka “neurons with big features” emerge in standard transformer training & prevent benefits of quantisa….
0
39
0
@lorenzo_noci
Lorenzo Noci
1 year
RT @GregorBachmann1: From stochastic parrot 🦜 to Clever Hans 🐴? In our work with @_vaishnavh we carefully analyse the debate surrounding ne….
0
5
0
@lorenzo_noci
Lorenzo Noci
1 year
For more details, including results on Transformers, ablations on loss functions and batch size, and the depth extension of our results, please check out our paper:
0
0
7
@lorenzo_noci
Lorenzo Noci
1 year
Our work is connected with previous works on Edge of Stability (@deepcohen et al., , in the sense that muP networks converge to the EoS value of 2/lr, and they do so width-independently (dashed lines in the sharpness plots).
1
0
6