Shicong Cen @tsen9731 X Profile

Shicong Cen

@tsen9731

Followers

16

Following

3

Media

0

Statuses

4

Pittsburgh

Joined June 2019

Don't wanna be here? Send us removal request.

Yuejie Chi

@yuejiec

3 months

A one-layer multi-head transformer, with CoT, enables both forward and reversal reasoning. The training dynamics analysis particularly illuminates how two (heads) are better than one! See Tong’s post below. Joint work with @TongYang_666 @yuhuang42 and Yingbin Liang.

Tong Yang

@TongYang_666

3 months

🚨 🔥 Multi-step reasoning is key to solving complex problems — and Transformers with Chain-of-Thought can do it surprisingly well. 🤔 But how does CoT function as a learned scratchpad that lets even shallow Transformers run sequential algorithms that would otherwise require

0

2

17

Gautam Kamath ✈️ NeurIPS 2025

@thegautamkamath

3 years

[Disclaimer: I havent seen evidence plz no witchhunt] A wild story: 1 Accusation of AC collusion ring 2 Whistleblower is his both his gf AND student 🫤 3 Accusation was months ago on Chinese internet, suddenly pops up on Reddit I hope @NeurIPSConf @icmlconf @iclr_conf investigate

Leon Derczynski ✍🏻 🌞🏠🌲

@LeonDerczynski

3 years

machine learning researchers learn to optimise their own best paper rate through collusion and other unregulated mechanisms https://t.co/GsZxab8jCG

13

12

134

Zeyuan Allen-Zhu, Sc.D.

@ZeyuanAllenZhu

6 years

Is deep learning is actually performing DEEP learning? We may have given the first proof that neural network is capable of efficient hierarchical learning, while existing theory only shows that deep learning can "simulate" non-hierarchical algorithms

Microsoft Research

@MSFTResearch

6 years

How does deep learning perform DEEP learning? Microsoft and CMU researchers establish a principle called "backward feature correction" and explain how very deep neural networks can actually perform DEEP hierarchical learning efficiently: https://t.co/9EtkaThXAT @ZeyuanAllenZhu

2

21

178