Nikhil Vyas Profile
Nikhil Vyas

@vyasnikhil96

Followers
786
Following
282
Media
3
Statuses
162

@OpenAI Prev: Postdoc at Harvard, PhD @MITEECS.

Joined August 2015
Don't wanna be here? Send us removal request.
@vyasnikhil96
Nikhil Vyas
3 months
RT @depen_morwani: Excited to attend #ICLR25 this week. My DMs are open, feel free to drop a message to talk about anything related to opti….
0
5
0
@vyasnikhil96
Nikhil Vyas
5 months
RT @_arohan_: Today some of my ex and new colleagues are hosting AlgoPerfy workshop I will drop by and participate….
0
18
0
@vyasnikhil96
Nikhil Vyas
5 months
RT @AshokCutkosky: Some ideas on a new optimizer from my student Qinzi Zhang: ( Early stages, but the empirical res….
0
16
0
@vyasnikhil96
Nikhil Vyas
5 months
Combining SOAP and Muon: and some rough thoughts on interesting future directions.
3
23
185
@vyasnikhil96
Nikhil Vyas
5 months
RT @ShamKakade6: 1/n In new work, we draw connections between accelerated SGD and various recent optimizers including AdEMAMix, Schedule-Fr….
0
14
0
@vyasnikhil96
Nikhil Vyas
8 months
RT @ShamKakade6: (1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping poin….
0
32
0
@vyasnikhil96
Nikhil Vyas
8 months
RT @brandfonbrener: How does test loss change as we change the training data? And how does this interact with scaling laws?. We propose a m….
0
16
0
@vyasnikhil96
Nikhil Vyas
9 months
@kellerjordan0 I should add that OrthogonalNesterov is a new optimizer, it is possible that some modification will close the gap.
0
0
8
@vyasnikhil96
Nikhil Vyas
9 months
SOAP on @kellerjordan0's nanogpt benchmark. SOAP seems to take ~> 10% less steps as compared to OrthogonalNesterov optimizer but also has more overhead.
3
8
66
@vyasnikhil96
Nikhil Vyas
10 months
Thread on our new optimizer (SOAP) which mixes Shampoo and AdamW:.
@ShamKakade6
Sham Kakade
10 months
6/n See our paper ( for more details. Work with @vyasnikhil96, @depen_morwani, @rosieyzh, @IShapira1, @brandfonbrener and Lucas Janson.
0
0
12
@vyasnikhil96
Nikhil Vyas
1 year
Also forgot the tag: #ICML2024.
0
0
0
@vyasnikhil96
Nikhil Vyas
1 year
Forgot to mention earlier: I will be applying to industry positions in Fall.
1
0
1
@vyasnikhil96
Nikhil Vyas
1 year
Comparing optimizers for LLM training:.
@ShamKakade6
Sham Kakade
1 year
Which optimizer is opt? Our new work compares SGD, Adam, Adafactor (+ momentum), Lion, and, simply, SignSGD on LLM training wrt performance _and_ hyperparameter stability. tldr: Use anything but SGD, the rest are nearly identical:
Tweet media one
1
0
0
@vyasnikhil96
Nikhil Vyas
1 year
On Shampoo:.
@vyasnikhil96
Nikhil Vyas
1 year
1/n A technical thread on our results in on connecting the Shampoo optimizer and Optimal Kronecker product approximation of the the Adagrad (or Hessian) preconditioner.
1
0
0
@vyasnikhil96
Nikhil Vyas
1 year
I will be at ICML from Monday onwards presenting and I have recently been thinking about optimization (linking some recent works below) but have pretty broad interests and would love to chat with people.
1
0
8
@vyasnikhil96
Nikhil Vyas
1 year
RT @rosieyzh: In our new work on evaluating optimizers for LLM training, we perform a series of experiments to investigate the role of adap….
0
30
0
@vyasnikhil96
Nikhil Vyas
1 year
RT @ShamKakade6: Which optimizer is opt? Our new work compares SGD, Adam, Adafactor (+ momentum), Lion, and, simply, SignSGD on LLM trainin….
0
36
0
@vyasnikhil96
Nikhil Vyas
1 year
Tagging some people who might be interested: @_arohan_, @borisdayma, @dvsaisurya, @Devvrit_Khatri, @runame_.
0
0
4