ishohei220 Profile Banner
Shohei Taniguchi Profile
Shohei Taniguchi

@ishohei220

Followers
1K
Following
3K
Media
51
Statuses
1K

Reseacher at the University of Tokyo @Matsuo_Lab. Deep generative models, stochastic optimization.

Joined March 2015
Don't wanna be here? Send us removal request.
@ishohei220
Shohei Taniguchi
1 year
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
23
193
1K
@ishohei220
Shohei Taniguchi
8 months
D1の峰岸さんとの論文がICMLに通りました。Transformerが文脈内学習の能力を獲得する過程で、モデルの内部で非常に特異な現象が現れることを示した論文です。 個人的にも非常に面白い話だと思うので、興味がある方はぜひ。
@GoukiMinegishi
Gouki Minegishi
8 months
Our paper "Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence" has accepted at #ICML2025🎉 We found training Transformers in a few-shot setting leads to the emergence of 3 circuits. Joint work with @frt03_ @ishohei220 @yusuke_iwasawa_ @ymatsuo
0
11
82
@ishohei220
Shohei Taniguchi
8 months
Our work on mechanical interpretability of Transformer models is accepted at #ICML2025. We find a very interesting phenomeon in the model circuits while Transformer is aquiring in-context learning ability during training.
@GoukiMinegishi
Gouki Minegishi
8 months
Our paper "Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence" has accepted at #ICML2025🎉 We found training Transformers in a few-shot setting leads to the emergence of 3 circuits. Joint work with @frt03_ @ishohei220 @yusuke_iwasawa_ @ymatsuo
0
1
7
@ishohei220
Shohei Taniguchi
1 year
I’ll present our ADOPT paper at #NeurIPS2024 from 4:30 PM to 7:30 PM today at West Ballroom A-D #6201. Feel free to come and make a discussion!
@ishohei220
Shohei Taniguchi
1 year
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
0
2
15
@ishohei220
Shohei Taniguchi
1 year
ADOPTのパッケージをPyPIで配布しました。今後は `pip install torch-adopt` で簡単に使えます。
@ishohei220
Shohei Taniguchi
1 year
Python package of ADOPT is published on PyPI. https://t.co/4UMKzhAULo It is now available via `pip install torch-adopt`.
1
7
39
@ishohei220
Shohei Taniguchi
1 year
Python package of ADOPT is published on PyPI. https://t.co/4UMKzhAULo It is now available via `pip install torch-adopt`.
Tweet card summary image
pypi.org
ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate
@ishohei220
Shohei Taniguchi
1 year
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
0
3
21
@ishohei220
Shohei Taniguchi
1 year
ADOPTが不安定になる事例があるというフィードバックをいただいたので、少しだけ実装にアップデートを行いました. これから試される方は新しい実装を参照していただけるとありがたいです.
@ishohei220
Shohei Taniguchi
1 year
**Update on the ADOPT optimizer** To address several reports that ADOPT sometimes gets unstable, a minor modification has been made to the algorithm. We observe that this modification greatly improves stability in many cases. https://t.co/dHuo4Z2GMz
2
1
3
@ishohei220
Shohei Taniguchi
1 year
We have also updated the arXiv paper, and provided a theoretical analysis for the clipped version of ADOPT. Importantly, we proved that the clipped version of ADOPT can also converge in the optimal rate. https://t.co/6kMDGAd8QF
Tweet card summary image
arxiv.org
Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_2$, in a...
1
1
14
@ishohei220
Shohei Taniguchi
1 year
Specifically, a clipping operation is added when updating momentum. This prevents the momentum update from being too large when v is small.
1
1
11
@ishohei220
Shohei Taniguchi
1 year
**Update on the ADOPT optimizer** To address several reports that ADOPT sometimes gets unstable, a minor modification has been made to the algorithm. We observe that this modification greatly improves stability in many cases. https://t.co/dHuo4Z2GMz
Tweet card summary image
github.com
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt
@ishohei220
Shohei Taniguchi
1 year
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
1
11
104
@wightmanr
Ross Wightman
1 year
A new optimizer that's better than Adam. I think I've heard that before. But, I tried this one, I was doing regression testing on some recent optimizer cleanup w/ real scenarios, I threw this in the mix. It did beat Adam, every time (so far). This one appears worth a closer look.
@ishohei220
Shohei Taniguchi
1 year
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
14
133
1K
@ishohei220
Shohei Taniguchi
1 year
実験的にも,幅広いタスクでADOPTはAdamよりも良い性能を示すことが確認できました.また,GPT-2の事前学習におけるロススパイクの軽減にも寄与することが観測されています.>1Bスケールのより大規模なモデルでの検証は,私自身も気になっているので,ぜひどなたか試してみてもらいたいです.
0
0
11
@ishohei220
Shohei Taniguchi
1 year
具体的には,(1) 2次モーメントの推定量から現在の勾配情報を取り除く,(2) 勾配のスケーリングとモメンタムの計算順序を入れ替える,という2つの変更を加えることで,収束を保証することができます.
1
0
14
@ishohei220
Shohei Taniguchi
1 year
NeurIPSの論文をarXivに上げました.Adamに軽微な修正を加えることで,ハイパラに依存せずに常に収束を保証できることを示した論文です.提案法のADOPTは,コードを1行変えればすぐに使えるので,ぜひ使って見てください. https://t.co/dHuo4Z2GMz
Tweet card summary image
github.com
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt
@ishohei220
Shohei Taniguchi
1 year
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
1
115
652
@ishohei220
Shohei Taniguchi
1 year
We observe that ADOPT can always converge even for examples where Adam fails, and performs better for many practical problems. Moreover, we find that ADOPT is effective to alleviate training instability (e.g., loss spikes) during pretraining GPT-2.
3
6
71
@ishohei220
Shohei Taniguchi
1 year
We find that the non-convergence can be fixed by two modifications: (1) remove the current gradient from the second moment estimate and (2) normalize the gradient before updating the momentum. By these simple modifications, we can guarantee the convergence in general cases.
2
3
59
@ishohei220
Shohei Taniguchi
1 year
Adam is dominantly used as an optimizer in deep learning, but it is known that Adam's convergence is not guaranteed in theory even for a very simple setting. In this paper, we demystify a fundamental cause of the non-convergence, and provide a way to fix it by minimal changes.
1
0
52
@Matsuo_Lab
東京大学 松尾・岩澤研究室
1 year
【残9枠】2025年4月1日入職 研究員募集締め切り:11/6(水) 松尾・岩澤研究室にて、研究員を20名募集しております。採用HPでは、松尾研の基礎研究が目指すもの・現在の研究領域・研究環境に関する情報を公開しております。 こちらも参考の上、奮ってご応募ください。 https://t.co/7TXLtH6TaQ
2
18
58