Shohei Taniguchi
@ishohei220
Followers
1K
Following
3K
Media
51
Statuses
1K
Reseacher at the University of Tokyo @Matsuo_Lab. Deep generative models, stochastic optimization.
Joined March 2015
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
23
193
1K
D1の峰岸さんとの論文がICMLに通りました。Transformerが文脈内学習の能力を獲得する過程で、モデルの内部で非常に特異な現象が現れることを示した論文です。 個人的にも非常に面白い話だと思うので、興味がある方はぜひ。
Our paper "Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence" has accepted at #ICML2025🎉 We found training Transformers in a few-shot setting leads to the emergence of 3 circuits. Joint work with @frt03_ @ishohei220 @yusuke_iwasawa_ @ymatsuo
0
11
82
Our work on mechanical interpretability of Transformer models is accepted at #ICML2025. We find a very interesting phenomeon in the model circuits while Transformer is aquiring in-context learning ability during training.
Our paper "Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence" has accepted at #ICML2025🎉 We found training Transformers in a few-shot setting leads to the emergence of 3 circuits. Joint work with @frt03_ @ishohei220 @yusuke_iwasawa_ @ymatsuo
0
1
7
I’ll present our ADOPT paper at #NeurIPS2024 from 4:30 PM to 7:30 PM today at West Ballroom A-D #6201. Feel free to come and make a discussion!
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
0
2
15
ADOPTのパッケージをPyPIで配布しました。今後は `pip install torch-adopt` で簡単に使えます。
Python package of ADOPT is published on PyPI. https://t.co/4UMKzhAULo It is now available via `pip install torch-adopt`.
1
7
39
Python package of ADOPT is published on PyPI. https://t.co/4UMKzhAULo It is now available via `pip install torch-adopt`.
pypi.org
ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
0
3
21
ADOPTが不安定になる事例があるというフィードバックをいただいたので、少しだけ実装にアップデートを行いました. これから試される方は新しい実装を参照していただけるとありがたいです.
**Update on the ADOPT optimizer** To address several reports that ADOPT sometimes gets unstable, a minor modification has been made to the algorithm. We observe that this modification greatly improves stability in many cases. https://t.co/dHuo4Z2GMz
2
1
3
We have also updated the arXiv paper, and provided a theoretical analysis for the clipped version of ADOPT. Importantly, we proved that the clipped version of ADOPT can also converge in the optimal rate. https://t.co/6kMDGAd8QF
arxiv.org
Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $β_2$, in a...
1
1
14
Specifically, a clipping operation is added when updating momentum. This prevents the momentum update from being too large when v is small.
1
1
11
**Update on the ADOPT optimizer** To address several reports that ADOPT sometimes gets unstable, a minor modification has been made to the algorithm. We observe that this modification greatly improves stability in many cases. https://t.co/dHuo4Z2GMz
github.com
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
1
11
104
A new optimizer that's better than Adam. I think I've heard that before. But, I tried this one, I was doing regression testing on some recent optimizer cleanup w/ real scenarios, I threw this in the mix. It did beat Adam, every time (so far). This one appears worth a closer look.
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
14
133
1K
実験的にも,幅広いタスクでADOPTはAdamよりも良い性能を示すことが確認できました.また,GPT-2の事前学習におけるロススパイクの軽減にも寄与することが観測されています.>1Bスケールのより大規模なモデルでの検証は,私自身も気になっているので,ぜひどなたか試してみてもらいたいです.
0
0
11
具体的には,(1) 2次モーメントの推定量から現在の勾配情報を取り除く,(2) 勾配のスケーリングとモメンタムの計算順序を入れ替える,という2つの変更を加えることで,収束を保証することができます.
1
0
14
Adamは深層学習でほぼデファクトになっていますが,実は収束を一般に保証できないことが先行研究でも知られていました (e.g., https://t.co/jUOXaxWrwI).本研究では,その収束を妨げている根本的な原因を調べ,非常に単純な改良で収束が保証できることを示しています.
arxiv.org
Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates...
1
0
16
NeurIPSの論文をarXivに上げました.Adamに軽微な修正を加えることで,ハイパラに依存せずに常に収束を保証できることを示した論文です.提案法のADOPTは,コードを1行変えればすぐに使えるので,ぜひ使って見てください. https://t.co/dHuo4Z2GMz
github.com
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt
Our NeurIPS paper is published on arXiv. In this paper, we propose a new optimizer ADOPT, which converges better than Adam in both theory and practice. You can use ADOPT by just replacing one line in your code. https://t.co/6kMDGAd8QF
1
115
652
The implementation is available at https://t.co/dHuo4Z2GMz.
github.com
Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate" - iShohei220/adopt
0
9
74
We observe that ADOPT can always converge even for examples where Adam fails, and performs better for many practical problems. Moreover, we find that ADOPT is effective to alleviate training instability (e.g., loss spikes) during pretraining GPT-2.
3
6
71
We find that the non-convergence can be fixed by two modifications: (1) remove the current gradient from the second moment estimate and (2) normalize the gradient before updating the momentum. By these simple modifications, we can guarantee the convergence in general cases.
2
3
59
Adam is dominantly used as an optimizer in deep learning, but it is known that Adam's convergence is not guaranteed in theory even for a very simple setting. In this paper, we demystify a fundamental cause of the non-convergence, and provide a way to fix it by minimal changes.
1
0
52
【残9枠】2025年4月1日入職 研究員募集締め切り:11/6(水) 松尾・岩澤研究室にて、研究員を20名募集しております。採用HPでは、松尾研の基礎研究が目指すもの・現在の研究領域・研究環境に関する情報を公開しております。 こちらも参考の上、奮ってご応募ください。 https://t.co/7TXLtH6TaQ
2
18
58