Shicong Cen
@tsen9731
Followers
16
Following
3
Media
0
Statuses
4
A one-layer multi-head transformer, with CoT, enables both forward and reversal reasoning. The training dynamics analysis particularly illuminates how two (heads) are better than one! See Tongβs post below. Joint work with @TongYang_666 @yuhuang42 and Yingbin Liang.
π¨ π₯ Multi-step reasoning is key to solving complex problems β and Transformers with Chain-of-Thought can do it surprisingly well. π€ But how does CoT function as a learned scratchpad that lets even shallow Transformers run sequential algorithms that would otherwise require
0
2
17
[Disclaimer: I havent seen evidence plz no witchhunt] A wild story: 1 Accusation of AC collusion ring 2 Whistleblower is his both his gf AND student π«€ 3 Accusation was months ago on Chinese internet, suddenly pops up on Reddit I hope @NeurIPSConf @icmlconf @iclr_conf investigate
machine learning researchers learn to optimise their own best paper rate through collusion and other unregulated mechanisms https://t.co/GsZxab8jCG
13
12
134
Is deep learning is actually performing DEEP learning? We may have given the first proof that neural network is capable of efficient hierarchical learning, while existing theory only shows that deep learning can "simulate" non-hierarchical algorithms
How does deep learning perform DEEP learning? Microsoft and CMU researchers establish a principle called "backward feature correction" and explain how very deep neural networks can actually perform DEEP hierarchical learning efficiently: https://t.co/9EtkaThXAT
@ZeyuanAllenZhu
2
21
178