Beidi Chen @BeidiChen X Profile

Beidi Chen

@BeidiChen

Followers

15K

Following

1K

Media

35

Statuses

540

Asst. Prof @CarnegieMellon, @amazon Scholar, Prev: Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.

https://t.co/P9QlHdjPS0

Joined November 2011

Don't wanna be here? Send us removal request.

Xun Huang

@xxunhuang

2 days

On-Policy Distillation with Reverse KL — sounds like Self-Forcing + DMD for language modeling 😀 Maybe a bidirectional (diffusion) teacher would make it even better?

Thinking Machines

@thinkymachines

3 days

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

2

10

85

Beidi Chen

@BeidiChen

7 days

📣 we study a threat model that users intent to leverage llm agent to fix problems in the code base but the agent could just insert vulnerabilities in while passes all the tests — I think security would be a more and more important problem when agents ability grows. So much fun

Infini-AI-Lab

@InfiniAILab

7 days

🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents

0

3

29

Xinyu Yang

@Xinyu2ML

8 days

Happy to see the effectiveness of sparse FT in balancing new information and old knowledge. We have proposed S2FT ( https://t.co/wkvTSket4h) with a similar motivation one year ago, and I believe the introduction of memory layer leads to better continual learning!

arxiv.org

Modern language models are powerful, but typically static after deployment. A major obstacle to building models that continually learn over time is catastrophic forgetting, where updating on new...

Jessy Lin

@realJessyLin

9 days

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with @AIatMeta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full

3

4

24

Beidi Chen

@BeidiChen

8 days

Wow congrats on the release! A very important step towards self-improving kernel agent 😉

Shanli Xing

@shanli_xing

9 days

🤔 Can AI optimize the systems it runs on? 🚀 Introducing FlashInfer-Bench, a workflow that makes AI systems self-improving with agents: - Standardized signature for LLM serving kernels - Implement kernels with your preferred language - Benchmark them against real-world serving

0

7

44

Beidi Chen

@BeidiChen

8 days

Congrats!!! So honored to be part of the team 🎉 Haha, first time making contribution in the computer architecture field — thanks for carrying me 🙏

2

0

46

Beidi Chen

@BeidiChen

22 days

📢🔥 New off-policy RL for LLMs — now training 32B model with 200+ stale steps for the first time, while still matching on-policy accuracy 💪 A big step toward scalable & decentralized agent training 😉

Infini-AI-Lab

@InfiniAILab

23 days

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and

4

19

213

Xinyu Yang

@Xinyu2ML

1 month

🚀 Excited to share that #Multiverse has been accepted to #NeurIPS 2025! Couldn’t have done it without such incredible collaborators—thank you!!

Infini-AI-Lab

@InfiniAILab

5 months

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n

1

4

22

Xtreme Gaming

@xtremegamingcn

2 months

[#TI2025 Summary ] In the just concluded The International 2025 finals, we lost to the strong opponent Falcons with a score of 2-3, and finally won the runner-up of this Ti. This year has been a bumpy one for us. Results fluctuated, roster changes occurred, and we've

241

436

4K

Jiawei Zhao

@jiawzhao

2 months

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong

63

333

2K

Beidi Chen

@BeidiChen

3 months

🎉 glad to see our attention sink is widely adopted and contribute to the strong open source models ~ please check out this post by @Guangxuan_Xiao on many insights and hypothesis. It would be interesting for folks who’ve seen artifacts / outliers on generated content and model

Guangxuan Xiao

@Guangxuan_Xiao

3 months

I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: https://t.co/0EAi2KQMMx

3

6

129

Infini-AI-Lab

@InfiniAILab

3 months

🤖 GPT-5 supports 128K output / 400K input tokens. 📜 Wiles’s Fermat proof took ~88K tokens — the final output only. 🧩 Add years of exploration, likely >880K tokens of reasoning. 🧠 Real intelligence isn’t about making it short — it’s about exploring the sparsity in the logic.

0

2

8

Guangxuan Xiao

@Guangxuan_Xiao

3 months

The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work ( https://t.co/u67QTC3rzh). Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁

OpenAI

@OpenAI

3 months

Our open models are here. Both of them. https://t.co/9tFxefOXcg

18

50

735

Beidi Chen

@BeidiChen

3 months

Big Congrats @Anshumali_ 🎈🎉

Rice Computer Science

@RiceCompSci

3 months

Congrats to Rice CS' @Anshumali_ Shrivastava, who has been promoted to full professor. Shrivastava is well on his way to revolutionizing how LLMs & other deep learning models are trained & stored, using new algorithms to make AI scalable & more accessible. https://t.co/8VpFk371gp

1

0

27

Hao AI Lab

@haoailab

3 months

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live

10

100

422

Yi Wu

@jxwuyi

3 months

Tired intricate system code for RL training? 🤯 We release AReaL-lite – A lightweight AReaL version for AI researchers! 🚀#opensource ✨ Algorithm-first design & APIs🎉 ✨ 80% less code w. 90% AReaL's full efficiency 🎉 ✨ Customizable agentic RL🎉 🔗 https://t.co/YUa03pp9LR

3

26

70

Beidi Chen

@BeidiChen

3 months

🥳

Infini-AI-Lab

@InfiniAILab

3 months

Huge thanks to @tinytitans_icml for an amazing workshop — see you next year! Honored to receive a Best Paper Award 🏆 Let’s unlock the potential of sparsity! Next up: scaling to hundreds/thousands of rollouts? Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run

8

5

145

Yang Zhou

@IronSteveZhou

4 months

I will be in front of the GSM-Infinite poster tomorrow 2-4:30 pm. 🫡 East Exhibition Hall E-2901. Please come and say hi. Happy to chat about LLM evals, synthetic data, and more!

Yang Zhou

@IronSteveZhou

6 months

🚨 Super excited to share that our paper "GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?" got accepted at ICML 2025! 🧠📏🎉 P.S. 😅 Struggling to make your long-context LLM submission stand out at NeurIPS? 🧠 Give

0

2

12

Beidi Chen

@BeidiChen

4 months

Ninja’d onto another ✈️ to #ICML after a delay — @Delta tried, but couldn’t stop me 💪 safe to tweet now 📢DM or come find me if you're into long-context, model/data efficiency, evaluation or test-time scaling! Also, I just started learning RL & world models — pls come teach me

0

2

40

Beidi Chen

@BeidiChen

4 months

Beginner Q: Anyone knows details why Ray doesn’t support ipv6? Was debugging verl on a cluster and found the root cause was ipv6 with Ray … it seems to be a known issue for a while but never get resolved?

4

0

9