
Nikhil Vyas
@vyasnikhil96
Followers
786
Following
282
Media
3
Statuses
162
@OpenAI Prev: Postdoc at Harvard, PhD @MITEECS.
Joined August 2015
RT @depen_morwani: Excited to attend #ICLR25 this week. My DMs are open, feel free to drop a message to talk about anything related to opti….
0
5
0
RT @_arohan_: Today some of my ex and new colleagues are hosting AlgoPerfy workshop I will drop by and participate….
0
18
0
RT @AshokCutkosky: Some ideas on a new optimizer from my student Qinzi Zhang: ( Early stages, but the empirical res….
0
16
0
RT @ShamKakade6: 1/n In new work, we draw connections between accelerated SGD and various recent optimizers including AdEMAMix, Schedule-Fr….
0
14
0
RT @ShamKakade6: (1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping poin….
0
32
0
RT @brandfonbrener: How does test loss change as we change the training data? And how does this interact with scaling laws?. We propose a m….
0
16
0
@kellerjordan0 I should add that OrthogonalNesterov is a new optimizer, it is possible that some modification will close the gap.
0
0
8
SOAP on @kellerjordan0's nanogpt benchmark. SOAP seems to take ~> 10% less steps as compared to OrthogonalNesterov optimizer but also has more overhead.
3
8
66
Thread on our new optimizer (SOAP) which mixes Shampoo and AdamW:.
6/n See our paper ( for more details. Work with @vyasnikhil96, @depen_morwani, @rosieyzh, @IShapira1, @brandfonbrener and Lucas Janson.
0
0
12
RT @rosieyzh: In our new work on evaluating optimizers for LLM training, we perform a series of experiments to investigate the role of adap….
0
30
0
RT @ShamKakade6: Which optimizer is opt? Our new work compares SGD, Adam, Adafactor (+ momentum), Lion, and, simply, SignSGD on LLM trainin….
0
36
0
Tagging some people who might be interested: @_arohan_, @borisdayma, @dvsaisurya, @Devvrit_Khatri, @runame_.
0
0
4