Shohei Taniguchi @ishohei220 X Profile

Shohei Taniguchi

@ishohei220

Followers

1K

Following

3K

Media

51

Statuses

1K

Reseacher at the University of Tokyo @Matsuo_Lab. Deep generative models, stochastic optimization.

Joined March 2015

Don't wanna be here? Send us removal request.

Shohei Taniguchi

@ishohei220

1 year

Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF

23

193

1K

Shohei Taniguchi

@ishohei220

8 months

D1の峰岸さんとの論文がICMLに通りました。Transformerが文脈内学習の能力を獲得する過程で、モデルの内部で非常に特異な現象が現れることを示した論文です。個人的にも非常に面白い話だと思うので、興味がある方はぜひ。

Gouki Minegishi

@GoukiMinegishi

8 months

Our paper "Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence" has accepted at #ICML2025🎉 We found training Transformers in a few-shot setting leads to the emergence of 3 circuits. Joint work with @frt03_ @ishohei220 @yusuke_iwasawa_ @ymatsuo

0

11

82

Shohei Taniguchi

@ishohei220

8 months

Our work on mechanical interpretability of Transformer models is accepted at #ICML2025. We find a very interesting phenomeon in the model circuits while Transformer is aquiring in-context learning ability during training.

Gouki Minegishi

@GoukiMinegishi

8 months

Our paper "Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence" has accepted at #ICML2025🎉 We found training Transformers in a few-shot setting leads to the emergence of 3 circuits. Joint work with @frt03_ @ishohei220 @yusuke_iwasawa_ @ymatsuo

0

1

7

Shohei Taniguchi

@ishohei220

1 year

I’ll present our ADOPT paper at #NeurIPS2024 from 4:30 PM to 7:30 PM today at West Ballroom A-D #6201. Feel free to come and make a discussion!

Shohei Taniguchi

@ishohei220

1 year

Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF

0

2

15

Shohei Taniguchi

@ishohei220

1 year

ADOPTのパッケージをPyPIで配布しました。今後は `pip install torch-adopt` で簡単に使えます。

Shohei Taniguchi

@ishohei220

1 year

Python package of ADOPT is published on PyPI. https://t.co/4UMKzhAULo It is now available via `pip install torch-adopt`.

1

7

39

Shohei Taniguchi

@ishohei220

1 year

Python package of ADOPT is published on PyPI. https://t.co/4UMKzhAULo It is now available via `pip install torch-adopt`.

pypi.org

ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate

Shohei Taniguchi

@ishohei220

1 year

Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF

0

3

21

Shohei Taniguchi

@ishohei220

1 year

ADOPTが不安定になる事例があるというフィードバックをいただいたので、少しだけ実装にアップデートを行いました．これから試される方は新しい実装を参照していただけるとありがたいです．

Shohei Taniguchi

@ishohei220

1 year

**Update on the ADOPT optimizer** To address several reports that ADOPT sometimes gets unstable, a minor modification has been made to the algorithm. We observe that this modification greatly improves stability in many cases. https://t.co/dHuo4Z2GMz

2

1

3

Shohei Taniguchi

@ishohei220

1 year

We have also updated the arXiv paper, and provided a theoretical analysis for the clipped version of ADOPT. Importantly, we proved that the clipped version of ADOPT can also converge in the optimal rate. https://t.co/6kMDGAd8QF

arxiv.org

Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_2$, in a...

1

14

Shohei Taniguchi

@ishohei220

1 year

Specifically, a clipping operation is added when updating momentum. This prevents the momentum update from being too large when v is small.

1

11

Shohei Taniguchi

@ishohei220

1 year

**Update on the ADOPT optimizer** To address several reports that ADOPT sometimes gets unstable, a minor modification has been made to the algorithm. We observe that this modification greatly improves stability in many cases. https://t.co/dHuo4Z2GMz

github.com

Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt

Shohei Taniguchi

@ishohei220

1 year

Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF

1

11

104

Ross Wightman

@wightmanr

1 year

A new optimizer that's better than Adam. I think I've heard that before. But, I tried this one, I was doing regression testing on some recent optimizer cleanup w/ real scenarios, I threw this in the mix. It did beat Adam, every time (so far). This one appears worth a closer look.

Shohei Taniguchi

@ishohei220

1 year

Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF

14

133

1K

Shohei Taniguchi

@ishohei220

1 year

実験的にも，幅広いタスクでADOPTはAdamよりも良い性能を示すことが確認できました．また，GPT-2の事前学習におけるロススパイクの軽減にも寄与することが観測されています．>1Bスケールのより大規模なモデルでの検証は，私自身も気になっているので，ぜひどなたか試してみてもらいたいです．

0

11

Shohei Taniguchi

@ishohei220

1 year

具体的には，(1) 2次モーメントの推定量から現在の勾配情報を取り除く，(2) 勾配のスケーリングとモメンタムの計算順序を入れ替える，という2つの変更を加えることで，収束を保証することができます．

1

0

14

Shohei Taniguchi

@ishohei220

1 year

Adamは深層学習でほぼデファクトになっていますが，実は収束を一般に保証できないことが先行研究でも知られていました (e.g., https://t.co/jUOXaxWrwI)．本研究では，その収束を妨げている根本的な原因を調べ，非常に単純な改良で収束が保証できることを示しています．

arxiv.org

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates...

1

0

16

Shohei Taniguchi

@ishohei220

1 year

NeurIPSの論文をarXivに上げました．Adamに軽微な修正を加えることで，ハイパラに依存せずに常に収束を保証できることを示した論文です．提案法のADOPTは，コードを1行変えればすぐに使えるので，ぜひ使って見てください． https://t.co/dHuo4Z2GMz

github.com

Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt

Shohei Taniguchi

@ishohei220

1 year

Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF

1

115

652

Shohei Taniguchi

@ishohei220

1 year

The implementation is available at https://t.co/dHuo4Z2GMz.

github.com

Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt

0

9

74

Shohei Taniguchi

@ishohei220

1 year

We observe that ADOPT can always converge even for examples where Adam fails, and performs better for many practical problems. Moreover, we find that ADOPT is effective to alleviate training instability (e.g., loss spikes) during pretraining GPT-2.

3

6

71

Shohei Taniguchi

@ishohei220

1 year

We find that the non-convergence can be fixed by two modifications: (1) remove the current gradient from the second moment estimate and (2) normalize the gradient before updating the momentum. By these simple modifications, we can guarantee the convergence in general cases.

2

3

59

Shohei Taniguchi

@ishohei220

1 year

Adam is dominantly used as an optimizer in deep learning, but it is known that Adam's convergence is not guaranteed in theory even for a very simple setting. In this paper, we demystify a fundamental cause of the non-convergence, and provide a way to fix it by minimal changes.

1

0

52

東京大学松尾・岩澤研究室

@Matsuo_Lab

1 year

【残9枠】2025年4月1日入職研究員募集締め切り:11/6(水) 松尾・岩澤研究室にて、研究員を20名募集しております。採用HPでは、松尾研の基礎研究が目指すもの・現在の研究領域・研究環境に関する情報を公開しております。こちらも参考の上、奮ってご応募ください。 https://t.co/7TXLtH6TaQ

2

18

58