blake__bordelon Profile Banner
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’» Profile
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»

@blake__bordelon

Followers
1K
Following
1K
Media
88
Statuses
313

ML/Neuroscience PhD student at @Harvard

Cambridge, MA
Joined July 2019
Don't wanna be here? Send us removal request.
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
3 months
Very fun stress testing depth scalings in LLMs with the very talented team @CerebrasSystems!.
@DeyNolan
Nolan Dey
3 months
(1/7) @CerebrasSystems Paper drop: TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧡 πŸ‘‡
Tweet media one
1
3
41
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
5 days
ICML this week! Come by. T PM @LauditiClarissa's work on muP BNNs . W AM, model of place field adaptation@mgkumar138, Jacob ZV W PM a model of LR transfer in linear NNs . all from senior author @CPehlevan!.
arxiv.org
We theoretically characterize gradient descent dynamics in deep linear networks trained at large width from random initialization and on large quantities of random data. Our theory captures the...
0
4
24
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
2 months
RT @DimaKrotov: Nice article! I appreciate that it mentions my work and the work of my students. I want to add to it. It is true that the….
0
13
0
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
3 months
0
2
8
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
3 months
RT @ABAtanasov: 1/n I’m very excited to present this Spotlight. It was one of the more creative projects of my PhD, and also the last one w….
0
23
0
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
3 months
RT @SuryaGanguli: Academia and tech need to stand together. Visa revocations and green card denials of our best and brightest in both spher….
0
31
0
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
7 months
Come by at Neurips to hear Hamza present about interesting properties of various feature learning infinite parameter limits of transformer models!. Poster in Hall A-C #4804 at 11 AM PST Friday. Paper . Work with @hamzatchaudhry and @CPehlevan
Tweet media one
0
7
44
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
9 months
Very well deserved! Congrats @jzavatoneveth on your continued success! Any chance you are hiring a postdoc?? πŸ˜‰.
@NIH_CommonFund
NIH Common Fund
10 months
Early Independence Awardee Jacob Zavatone-Veth of @Harvard's Society of Fellows is researching how neural networks model large-scale #NeuralData to advance our understanding of #DeepLearning. Read more:
0
0
9
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
Congrats @MattSFarrell @CPehlevan on this great paper!.
@MattSFarrell
Matthew Farrell
1 year
My paper with @CPehlevan is out now in PNAS! Sequences are a core part of an animal's behavioral repertoire, and Hebbian learning allows neural circuits to store memories of sequences for later recall.
0
0
11
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
Seconded! This was a wonderful workshop. Learned a lot from all of the in-depth talks. Thanks again to organizers Francesca Mastrogiuseppe @APalmigiano @ai_ngrosso @sebastiangoldt !.
@LLogiaco
Laureline Logiaco
1 year
Back from this workshop, wonderfully organized by F. Mastrogiuseppe, @APalmigiano, @ai_ngrosso & @sebastiangoldt-thank you! Long 90-mins (chalk) talks powered some of the most meaningful scientific exchanges I've ever had. I'm hoping to further contribute to this community later!
Tweet media one
0
0
9
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
I want to highlight the terrific coauthors who were involved in some of the projects presented here:. Large width consistency: @vyasnikhil96 @depen_morwani , Sabarish Sainathan . Large depth limits: @lorenzo_noci @mufan_li @BorisHanin.
0
0
8
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
A quick summary of recent works from our group on limits of neural network training. Once you control the scale of feature learning, wider + deeper tends to be better as noisy finite NNs approach their deterministic limits.
@KempnerInst
Kempner Institute at Harvard University
1 year
NEW! Check out recent findings on width and depth limits in part 1 of a #KempnerInstitute two-part series from @ABAtanasov, @blake__bordelon & @CPehlevan. Read on: #neuralnetworks #AI
Tweet media one
1
2
34
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
Excited to visit Princeton tomorrow and give a talk at the Alg-ML seminar If you are in the area and would like to meet for a chat, please reach out!.
princeton-alg-ml.github.io
0
8
42
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
While this theory is predictive of networks in the lazy training regime, the predicted exponents are too pessimistic for networks in the feature learning regime. We hope to develop better theories of compute scaling laws in the rich regime in future work.
Tweet media one
1
0
2
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
If data is limited, the theory also predicts diminishing returns to increasing model size since eventually the data bottlenecked is achieved at sufficiently large width.
Tweet media one
1
0
2
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
For finite data, the theory predicts how test and train loss gaps accumulate over training time. An exact relation for this gap can be expressed in terms of our order parameters.
Tweet media one
1
0
1
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
One can improve the performance of finite width NNs by averaging the outputs of many NNs. This method reduces variance due to random init. However, we show in this toy model that ensembling is rarely compute optimal as increasing width decreases bias and variance.
Tweet media one
1
0
1
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
From these bottleneck scalings, we can extract the compute optimal scaling exponent, which is task and data distribution dependent. Below is one example
Tweet media one
1
0
1
@blake__bordelon
Blake Bordelon β˜•οΈπŸ§ͺπŸ‘¨β€πŸ’»
1 year
Our theory predicts that the loss is generally bottlenecked by one of the three computational resources: time t, model size N and data P. For features with power law covariance structure, we find that, these losses scale as a power law in the bottleneck resource.
Tweet media one
1
0
2