BeidiChen Profile Banner
Beidi Chen Profile
Beidi Chen

@BeidiChen

Followers
15K
Following
1K
Media
35
Statuses
530

Asst. Prof @CarnegieMellon, @amazon Scholar, Prev: Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.

Joined November 2011
Don't wanna be here? Send us removal request.
@BeidiChen
Beidi Chen
2 days
🎉 glad to see our attention sink is widely adopted and contribute to the strong open source models ~ please check out this post by @Guangxuan_Xiao on many insights and hypothesis. It would be interesting for folks who’ve seen artifacts / outliers on generated content and model.
@Guangxuan_Xiao
Guangxuan Xiao
3 days
I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details:.
Tweet media one
3
6
125
@BeidiChen
Beidi Chen
3 days
RT @InfiniAILab: 🤖 GPT-5 supports 128K output / 400K input tokens. 📜 Wiles’s Fermat proof took ~88K tokens — the final output only. 🧩 Add….
0
2
0
@BeidiChen
Beidi Chen
5 days
RT @Guangxuan_Xiao: The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work (.….
0
47
0
@BeidiChen
Beidi Chen
6 days
Big Congrats @Anshumali_ 🎈🎉.
@RiceCompSci
Rice Computer Science
6 days
Congrats to Rice CS' @Anshumali_ Shrivastava, who has been promoted to full professor. Shrivastava is well on his way to revolutionizing how LLMs & other deep learning models are trained & stored, using new algorithms to make AI scalable & more accessible.
Tweet media one
1
0
27
@BeidiChen
Beidi Chen
6 days
RT @haoailab: (1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU!. Introducing FastWan series,….
0
110
0
@BeidiChen
Beidi Chen
9 days
RT @jxwuyi: Tired intricate system code for RL training? 🤯 .We release AReaL-lite – A lightweight AReaL version for AI researchers! 🚀#opens….
0
23
0
@BeidiChen
Beidi Chen
18 days
🥳.
@InfiniAILab
Infini-AI-Lab
18 days
Huge thanks to @tinytitans_icml for an amazing workshop — see you next year!.Honored to receive a Best Paper Award 🏆. Let’s unlock the potential of sparsity! .Next up: scaling to hundreds/thousands of rollouts?.Or making powerful R1/K2-level LLMs (not just 8B 4-bit models) run
Tweet media one
8
5
145
@BeidiChen
Beidi Chen
24 days
RT @IronSteveZhou: I will be in front of the GSM-Infinite poster tomorrow 2-4:30 pm. 🫡 East Exhibition Hall E-2901. Please come and say hi.….
0
2
0
@BeidiChen
Beidi Chen
28 days
Ninja’d onto another ✈️ to #ICML after a delay — @Delta tried, but couldn’t stop me 💪 safe to tweet now.📢DM or come find me if you're into long-context, model/data efficiency, evaluation or test-time scaling! Also, I just started learning RL & world models — pls come teach me
Tweet media one
0
2
40
@BeidiChen
Beidi Chen
29 days
Beginner Q: Anyone knows details why Ray doesn’t support ipv6? Was debugging verl on a cluster and found the root cause was ipv6 with Ray … it seems to be a known issue for a while but never get resolved?.
4
0
9
@BeidiChen
Beidi Chen
30 days
Cool blog! Initially the longterm memory and consistency requirement of world model is something I felt the current video gen techniques couldn’t get us there yet. So I’m not a believer on world models just by scaling video gen. But I gave it a second thought — worrying too much.
@xunhuang1995
Xun Huang
1 month
What exactly is a "world model"? And what limits existing video generation models from being true world models?. In my new blog post, I argue that a true video world model must be causal, interactive, persistent, real-time, and physical accurate.
0
3
13
@BeidiChen
Beidi Chen
30 days
👀.
@andrewdfeldman
Andrew Feldman
1 month
We are just a little bit faster than@nvidia GPUs on Qwen 235B. 18X faster. @CerebrasSystems inference is blazing fast. come build cool stuff on Cerebras inference
0
0
7
@BeidiChen
Beidi Chen
30 days
RT @svlevine: Action chunking is a great idea in robotics: by getting a model to produce a short sequence of actions, it _just works better….
0
108
0
@BeidiChen
Beidi Chen
30 days
Haha I didn’t realize just implement continuous batching will make such a big difference 😯.
@chenzhuoming911
chen zhuoming
30 days
@gabriberton @BeidiChen About 2.5-6x faster.
Tweet media one
0
0
11
@BeidiChen
Beidi Chen
1 month
Always wanted to get rid of it!! I remember I suspected the correlation between the success of llama3 and a big expansion of its Vocab size😁 it was also very painful for speculative decoding (we once wanted to use SSM as a good draft for longcontext transformers but failed due.
@sukjun_hwang
Sukjun (June) Hwang
1 month
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
Tweet media one
Tweet media two
3
8
148
@BeidiChen
Beidi Chen
1 month
RT @allen_ai: Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collabora….
0
74
0
@BeidiChen
Beidi Chen
1 month
RT @WentaoGuo7: 🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On….
0
73
0
@BeidiChen
Beidi Chen
1 month
I was asked many times lately what repo to use by students who’re working on test-time scaling with slight modified attention or generation workflow (customized reward model /search). HF is a bit too time consuming esp with tons of token generation and Sglang/vllm is a bit hard.
@InfiniAILab
Infini-AI-Lab
1 month
🧵 Glad to introduce LiteSys . the inference framework we used in📄 Kinetics: Rethinking Test-Time Scaling Laws ( to evaluate test-time scaling (32K+ generated tokens) at scale. If you are:.✅ Looking for an inference framework that's easy to extend. 🐢
Tweet media one
2
24
224
@BeidiChen
Beidi Chen
1 month
RT @aviral_kumar2: Given the confusion around what RL does for reasoning in LLMs, @setlur_amrith & I wrote a new blog post on when RL simpl….
Tweet card summary image
pinnate-flare-8f3.notion.site
Amrith Setlur and Aviral Kumar, Carnegie Mellon University
0
39
0