Nikhil Vyas @vyasnikhil96 X Profile

Nikhil Vyas

@vyasnikhil96

Followers

798

Following

285

Media

3

Statuses

162

@OpenAI Prev: Postdoc at Harvard, PhD @MITEECS.

Joined August 2015

Don't wanna be here? Send us removal request.

Nikhil Vyas

@vyasnikhil96

4 months

RT @depen_morwani: Excited to attend #ICLR25 this week. My DMs are open, feel free to drop a message to talk about anything related to opti….

0

5

0

Nikhil Vyas

@vyasnikhil96

6 months

RT @AshokCutkosky: Some ideas on a new optimizer from my student Qinzi Zhang: ( Early stages, but the empirical res….

github.com

Contribute to ZQZCalin/trainit development by creating an account on GitHub.

0

16

0

Nikhil Vyas

@vyasnikhil96

6 months

Combining SOAP and Muon: and some rough thoughts on interesting future directions.

3

23

185

Nikhil Vyas

@vyasnikhil96

6 months

RT @ShamKakade6: 1/n In new work, we draw connections between accelerated SGD and various recent optimizers including AdEMAMix, Schedule-Fr….

0

16

0

Nikhil Vyas

@vyasnikhil96

9 months

RT @ShamKakade6: (1/n) 💡How can we speed up the serial runtime of long pre-training runs? Enter Critical Batch Size (CBS): the tipping poin….

0

33

0

Nikhil Vyas

@vyasnikhil96

9 months

RT @brandfonbrener: How does test loss change as we change the training data? And how does this interact with scaling laws?. We propose a m….

0

17

0

Nikhil Vyas

@vyasnikhil96

10 months

@kellerjordan0 I should add that OrthogonalNesterov is a new optimizer, it is possible that some modification will close the gap.

0

8

Nikhil Vyas

@vyasnikhil96

10 months

SOAP on @kellerjordan0's nanogpt benchmark. SOAP seems to take ~> 10% less steps as compared to OrthogonalNesterov optimizer but also has more overhead.

3

8

66

Nikhil Vyas

@vyasnikhil96

11 months

Thread on our new optimizer (SOAP) which mixes Shampoo and AdamW:.

Sham Kakade

@ShamKakade6

11 months

6/n See our paper ( for more details. Work with @vyasnikhil96, @depen_morwani, @rosieyzh, @IShapira1, @brandfonbrener and Lucas Janson.

0

12

Nikhil Vyas

@vyasnikhil96

1 year

Also forgot the tag: #ICML2024.

0

Nikhil Vyas

@vyasnikhil96

1 year

Forgot to mention earlier: I will be applying to industry positions in Fall.

1

0

1

Nikhil Vyas

@vyasnikhil96

1 year

Comparing optimizers for LLM training:.

Sham Kakade

@ShamKakade6

1 year

Which optimizer is opt? Our new work compares SGD, Adam, Adafactor (+ momentum), Lion, and, simply, SignSGD on LLM training wrt performance _and_ hyperparameter stability. tldr: Use anything but SGD, the rest are nearly identical:

1

0

Nikhil Vyas

@vyasnikhil96

1 year

On Shampoo:.

Nikhil Vyas

@vyasnikhil96

1 year

1/n A technical thread on our results in on connecting the Shampoo optimizer and Optimal Kronecker product approximation of the the Adagrad (or Hessian) preconditioner.

1

0

Nikhil Vyas

@vyasnikhil96

1 year

I will be at ICML from Monday onwards presenting and I have recently been thinking about optimization (linking some recent works below) but have pretty broad interests and would love to chat with people.

1

0

8

Nikhil Vyas

@vyasnikhil96

1 year

RT @rosieyzh: In our new work on evaluating optimizers for LLM training, we perform a series of experiments to investigate the role of adap….

0

31

0

Nikhil Vyas

@vyasnikhil96

1 year

RT @ShamKakade6: Which optimizer is opt? Our new work compares SGD, Adam, Adafactor (+ momentum), Lion, and, simply, SignSGD on LLM trainin….

0

35

0

Nikhil Vyas

@vyasnikhil96

1 year

Tagging some people who might be interested: @_arohan_, @borisdayma, @dvsaisurya, @Devvrit_Khatri, @runame_.

0

4