Blake Bordelon ☕️🧪👨‍💻 @blake__bordelon X Profile

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

Followers

1K

Following

1K

Media

88

Statuses

313

ML/Neuroscience PhD student at @Harvard

Cambridge, MA

Joined July 2019

Don't wanna be here? Send us removal request.

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

3 months

Very fun stress testing depth scalings in LLMs with the very talented team @CerebrasSystems!.

Nolan Dey

@DeyNolan

3 months

(1/7) @CerebrasSystems Paper drop: TLDR: We introduce CompleteP, which offers depth-wise hyperparameter (HP) transfer (Left), FLOP savings when training deep models (Middle), and a larger range of compute-efficient width/depth ratios (Right). 🧵 👇

1

3

41

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

5 days

ICML this week! Come by. T PM @LauditiClarissa's work on muP BNNs . W AM, model of place field adaptation@mgkumar138, Jacob ZV W PM a model of LR transfer in linear NNs . all from senior author @CPehlevan!.

arxiv.org

We theoretically characterize gradient descent dynamics in deep linear networks trained at large width from random initialization and on large quantities of random data. Our theory captures the...

0

4

24

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

2 months

RT @DimaKrotov: Nice article! I appreciate that it mentions my work and the work of my students. I want to add to it. It is true that the….

0

13

0

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

3 months

@CerebrasSystems Builds on prior works from @lorenzo_noci @mufan_li @BorisHanin @hamzatchaudhry @CPehlevan .

0

2

8

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

3 months

RT @ABAtanasov: 1/n I’m very excited to present this Spotlight. It was one of the more creative projects of my PhD, and also the last one w….

0

23

0

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

3 months

RT @SuryaGanguli: Academia and tech need to stand together. Visa revocations and green card denials of our best and brightest in both spher….

0

31

0

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

7 months

Come by at Neurips to hear Hamza present about interesting properties of various feature learning infinite parameter limits of transformer models!. Poster in Hall A-C #4804 at 11 AM PST Friday. Paper . Work with @hamzatchaudhry and @CPehlevan

0

7

44

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

9 months

Very well deserved! Congrats @jzavatoneveth on your continued success! Any chance you are hiring a postdoc?? 😉.

NIH Common Fund

@NIH_CommonFund

10 months

Early Independence Awardee Jacob Zavatone-Veth of @Harvard's Society of Fellows is researching how neural networks model large-scale #NeuralData to advance our understanding of #DeepLearning. Read more:

0

9

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

Congrats @MattSFarrell @CPehlevan on this great paper!.

Matthew Farrell

@MattSFarrell

1 year

My paper with @CPehlevan is out now in PNAS! Sequences are a core part of an animal's behavioral repertoire, and Hebbian learning allows neural circuits to store memories of sequences for later recall.

0

11

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

Seconded! This was a wonderful workshop. Learned a lot from all of the in-depth talks. Thanks again to organizers Francesca Mastrogiuseppe @APalmigiano @ai_ngrosso @sebastiangoldt !.

Laureline Logiaco

@LLogiaco

1 year

Back from this workshop, wonderfully organized by F. Mastrogiuseppe, @APalmigiano, @ai_ngrosso & @sebastiangoldt-thank you! Long 90-mins (chalk) talks powered some of the most meaningful scientific exchanges I've ever had. I'm hoping to further contribute to this community later!

0

9

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

I want to highlight the terrific coauthors who were involved in some of the projects presented here:. Large width consistency: @vyasnikhil96 @depen_morwani , Sabarish Sainathan . Large depth limits: @lorenzo_noci @mufan_li @BorisHanin.

0

8

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

A quick summary of recent works from our group on limits of neural network training. Once you control the scale of feature learning, wider + deeper tends to be better as noisy finite NNs approach their deterministic limits.

Kempner Institute at Harvard University

@KempnerInst

1 year

NEW! Check out recent findings on width and depth limits in part 1 of a #KempnerInstitute two-part series from @ABAtanasov, @blake__bordelon & @CPehlevan. Read on: #neuralnetworks #AI

1

2

34

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

Excited to visit Princeton tomorrow and give a talk at the Alg-ML seminar If you are in the area and would like to meet for a chat, please reach out!.

princeton-alg-ml.github.io

0

8

42

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

Lastly here is the link to the preprint! Thanks again to coauthors @ABAtanasov and @CPehlevan .

arxiv.org

On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude. This phenomenon is known as a neural...

0

7

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

While this theory is predictive of networks in the lazy training regime, the predicted exponents are too pessimistic for networks in the feature learning regime. We hope to develop better theories of compute scaling laws in the rich regime in future work.

1

0

2

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

If data is limited, the theory also predicts diminishing returns to increasing model size since eventually the data bottlenecked is achieved at sufficiently large width.

1

0

2

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

For finite data, the theory predicts how test and train loss gaps accumulate over training time. An exact relation for this gap can be expressed in terms of our order parameters.

1

0

1

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

One can improve the performance of finite width NNs by averaging the outputs of many NNs. This method reduces variance due to random init. However, we show in this toy model that ensembling is rarely compute optimal as increasing width decreases bias and variance.

1

0

1

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

From these bottleneck scalings, we can extract the compute optimal scaling exponent, which is task and data distribution dependent. Below is one example

1

0

1

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

1 year

Our theory predicts that the loss is generally bottlenecked by one of the three computational resources: time t, model size N and data P. For features with power law covariance structure, we find that, these losses scale as a power law in the bottleneck resource.

1

0

2