tsen9731 Profile Banner
Shicong Cen Profile
Shicong Cen

@tsen9731

Followers
16
Following
3
Media
0
Statuses
4

Pittsburgh
Joined June 2019
Don't wanna be here? Send us removal request.
@yuejiec
Yuejie Chi
3 months
A one-layer multi-head transformer, with CoT, enables both forward and reversal reasoning. The training dynamics analysis particularly illuminates how two (heads) are better than one! See Tong’s post below. Joint work with @TongYang_666 @yuhuang42 and Yingbin Liang.
@TongYang_666
Tong Yang
3 months
🚨 πŸ”₯ Multi-step reasoning is key to solving complex problems β€” and Transformers with Chain-of-Thought can do it surprisingly well. πŸ€” But how does CoT function as a learned scratchpad that lets even shallow Transformers run sequential algorithms that would otherwise require
0
2
17
@thegautamkamath
Gautam Kamath ✈️ NeurIPS 2025
3 years
[Disclaimer: I havent seen evidence plz no witchhunt] A wild story: 1 Accusation of AC collusion ring 2 Whistleblower is his both his gf AND student 🫀 3 Accusation was months ago on Chinese internet, suddenly pops up on Reddit I hope @NeurIPSConf @icmlconf @iclr_conf investigate
@LeonDerczynski
Leon Derczynski ✍🏻 🌞🏠🌲
3 years
machine learning researchers learn to optimise their own best paper rate through collusion and other unregulated mechanisms https://t.co/GsZxab8jCG
13
12
134
@ZeyuanAllenZhu
Zeyuan Allen-Zhu, Sc.D.
6 years
Is deep learning is actually performing DEEP learning? We may have given the first proof that neural network is capable of efficient hierarchical learning, while existing theory only shows that deep learning can "simulate" non-hierarchical algorithms
@MSFTResearch
Microsoft Research
6 years
How does deep learning perform DEEP learning? Microsoft and CMU researchers establish a principle called "backward feature correction" and explain how very deep neural networks can actually perform DEEP hierarchical learning efficiently: https://t.co/9EtkaThXAT @ZeyuanAllenZhu
2
21
178