
Daniel Kunin
@KuninDaniel
Followers
744
Following
169
Media
29
Statuses
70
PhD student @ICMEStanford Creator @SeeingTheory
Stanford University
Joined December 2020
🌟Announcing NeurIPS spotlight paper on the transition from lazy to rich🔦 We reveal through exact gradient flow dynamics how unbalanced initializations promote rapid feature learning co-led @AllanRaventos and @ClementineDomi6 @FCHEN_AI @klindt_david @SaxeLab @SuryaGanguli
5
41
237
🚀 Exciting news! Our paper "From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks" has been accepted at ICLR 2025! https://t.co/6B7T1ROrc2 A thread on how relative weight initialization shapes learning dynamics in deep networks. 🧵 (1/9)
2
62
236
Wednesday, April 9th at 11AM: TTIC's Young Researcher Seminar Series presents Daniel Kunin (@KuninDaniel) of @StanfordEng with a talk titled "Learning Mechanics of Neural Networks: Conservation Laws, Implicit Biases, and Feature Learning." Please join us in Room 530, 5th floor.
0
1
3
1/ Our new paper: “Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning” on how to change training to better exploit test-time compute! co-led by @AllanRaventos, w/ Nan Cheng, @SuryaGanguli & @ShaulDr
https://t.co/xM49OB6sk7
1
5
19
Excited to finally share this work w/ @SuryaGanguli. Tl;dr: we find the first closed-form analytical theory that replicates the outputs of the very simplest diffusion models, with median pixel wise r^2 values of 90%+. https://t.co/SYkAAh6k4C
20
153
940
Come check out our #NeurIPS2024 spotlight poster on feature learning tomorrow! 📍East Exhibit Hall A-C #2102 📅Thu 12 Dec 4:30 p.m. — 7:30 p.m. PST
🌟Announcing NeurIPS spotlight paper on the transition from lazy to rich🔦 We reveal through exact gradient flow dynamics how unbalanced initializations promote rapid feature learning co-led @AllanRaventos and @ClementineDomi6 @FCHEN_AI @klindt_david @SaxeLab @SuryaGanguli
0
7
49
Great job, it was an honor being part of this amazing project! Congrats to the team 💪
🌟Announcing NeurIPS spotlight paper on the transition from lazy to rich🔦 We reveal through exact gradient flow dynamics how unbalanced initializations promote rapid feature learning co-led @AllanRaventos and @ClementineDomi6 @FCHEN_AI @klindt_david @SaxeLab @SuryaGanguli
0
2
19
Also, big shoutout to @yasamanbb, @CPehlevan, and @HSompolinsky for coordinating last year's 'Deep Learning from Physics and Neuroscience' program @KITP_UCSB. Our amazing team met there, and this project is a direct result of the conversations we had!
0
1
6
Check out our work for more details: https://t.co/VK0Wa0tdCa ...and stay tuned for follow-ups!
arxiv.org
While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich...
1
0
11
We provide empirical evidence that an unbalanced rich regime drives feature learning in deep networks, promotes interpretability of early layers in CNNs, reduces sample complexity of learning hierarchical data, and decreases time to grokking in modular arithmetic
1
1
17
Applying our function space analysis to shallow ReLU networks, we find that rapid feature learning occurs from unbalanced initializations that promote faster learning in early layers, driving a large change in activation patterns, but a small change in parameter space
1
1
7
We find three regimes in function space: 1. lazy akin to linear regression, 2. rich akin to silent alignment (Atanasov et al. 2021), 3. delayed-rich initially lazy followed by rich We extend this analysis (with mirror flows and implicit biases) to wide & deep linear networks
1
1
4
We derive exact gradient flow solutions for a minimal two-layer linear model displaying lazy and rich learning, which reveals that the relative scale between layers influence feature learning through conserved quantities that constrain the geometry of learning trajectories
1
1
7
Reproducing Fig.1 in Chizat et al. 2019 we find that even at small overall scale, the relative scale between layers can transition the network between rich and lazy learning and the best generalization occurs at small scale and large relative scale!
1
1
9
Really cool theory project on feature learning. If you are at the HiLD workshop @icmlconf check it out!
Interested in exactly solvable models of learning dynamics and implicit bias? Come check out our "Get Rich Quick" poster at the HiLD Workshop @icmlconf at 10am! With @KuninDaniel, myself, @ClementineDomi6, @FCHEN_AI, @klindt_david, @SaxeLab, and @SuryaGanguli.
1
1
14
Interested in exactly solvable models of learning dynamics and implicit bias? Come check out our "Get Rich Quick" poster at the HiLD Workshop @icmlconf at 10am! With @KuninDaniel, myself, @ClementineDomi6, @FCHEN_AI, @klindt_david, @SaxeLab, and @SuryaGanguli.
0
13
48
Reminder! Happening Tomorrow! @ELLISforEurope
We are delighted to announce our next speakers for the @ELLISforEurope RG are fr @Stanford, the authors of the NeurIPS2023 paper (https://t.co/FNikSjwN97)-@FCHEN_AI @KuninDaniel @atsushi_y1230 & @SuryaGanguli on 13th Feb '24 @ 5pm CET on Zoom. Save the Date! Link to join the RG👇
0
1
5
To get the zoom link and get notified about other interesting talks check out
0
1
2
🚨@FCHEN_AI, @atsushi_y1230, and I will be presenting our NeurIPS2023 paper https://t.co/6NuqR0nRJA to ELLIS Mathematics of Deep Learning reading group tomorrow Feb 13 at 5pm CET. Join to learn more about stochastic collapse!
arxiv.org
In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of...
1
2
15
We are delighted to announce our next speakers for the @ELLISforEurope RG are fr @Stanford, the authors of the NeurIPS2023 paper (https://t.co/FNikSjwN97)-@FCHEN_AI @KuninDaniel @atsushi_y1230 & @SuryaGanguli on 13th Feb '24 @ 5pm CET on Zoom. Save the Date! Link to join the RG👇
arxiv.org
In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of...
I am delighted to be a chair for an @ELLISforEurope Reading Group on Mathematics of Deep Learning along with @LinaraAdylova and Sidak @unregularized. The link to join the group is here - https://t.co/lkq0IlUDbA, looking forward to meeting new people! @CompSciOxford @oxengsci
0
4
17