Xinyu Yang Profile
Xinyu Yang

@Xinyu2ML

Followers
760
Following
412
Media
26
Statuses
239

Ph.D. @CarnegieMellon. Working on data and hardware-driven principled algorithm & system co-design for scalable and generalizable foundation models. They/Them

Pittsburgh, US
Joined December 2022
Don't wanna be here? Send us removal request.
@Xinyu2ML
Xinyu Yang
20 days
🚀 Super excited to share Multiverse!. 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level.
@InfiniAILab
Infini-AI-Lab
20 days
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46%. 🌐 Website: 🧵 1/n
3
18
58
@Xinyu2ML
Xinyu Yang
3 days
RT @ZeyuanAllenZhu: Facebook AI Research (FAIR) is a small, prestigious lab in Meta. We don't train large models like GenAI or MSL, so it's….
0
59
0
@Xinyu2ML
Xinyu Yang
4 days
RT @sansa19739319: 🤖Can diffusion models write code competitively?.Excited to share our latest 7B coding diffusion LLM!!💻. With DiffuCoder,….
0
107
0
@Xinyu2ML
Xinyu Yang
5 days
RT @fengyao1909: 😵‍💫 Struggling with 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐌𝐨𝐄?. Meet 𝐃𝐞𝐧𝐬𝐞𝐌𝐢𝐱𝐞𝐫 — an MoE post-training method that offers more 𝐩𝐫𝐞𝐜𝐢𝐬𝐞 𝐫𝐨𝐮𝐭𝐞𝐫 𝐠𝐫𝐚𝐝𝐢𝐞….
0
53
0
@Xinyu2ML
Xinyu Yang
5 days
Tweet media one
0
10
0
@Xinyu2ML
Xinyu Yang
5 days
RT @gan_chuang: 🧠 LLMs think too much—and waste tokens! Can we precisely control how long they reason?. Introducing Budget Guidance — a th….
0
18
0
@Xinyu2ML
Xinyu Yang
5 days
RT @chelseabfinn: We still lack a scalable recipe for RL post-training seeded with demonstration data. Many methods add an imitation loss,….
0
36
0
@Xinyu2ML
Xinyu Yang
5 days
RT @yifeiwang77: 🔥Thrilled to share that our sparse embedding method CSR (ICML’25 oral) is now officially supported in SentenceTransformers….
0
1
0
@Xinyu2ML
Xinyu Yang
6 days
RT @xiuyu_l: Sparsity can make your LoRA fine-tuning go brrr 💨. Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2….
0
57
0
@Xinyu2ML
Xinyu Yang
9 days
RT @xichen_pan: The code and instruction-tuning data for MetaQuery are now open-sourced!.Code: Data: .
0
22
0
@Xinyu2ML
Xinyu Yang
9 days
RT @_hanlin_zhang_: [1/n] New work [JSKZ25] w/ @JikaiJin2002, @syrgkanis, @ShamKakade6. We introduce new formulations and tools for evalu….
0
11
0
@Xinyu2ML
Xinyu Yang
10 days
RT @Huangyu58589918: What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of tra….
0
17
0
@Xinyu2ML
Xinyu Yang
10 days
RT @NovaSkyAI: ✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easi….
0
43
0
@Xinyu2ML
Xinyu Yang
10 days
RT @RunjiaLi: 🎉 VMem is officially accepted to ICCV 2025!.Excited to chat with everyone in Hawaii about making video generation consistent….
0
7
0
@Xinyu2ML
Xinyu Yang
11 days
vote for Allen-Zhu :).
@arankomatsuzaki
Aran Komatsuzaki
17 days
I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giving them all the budget and power.
0
0
12
@Xinyu2ML
Xinyu Yang
11 days
RT @arankomatsuzaki: I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giv….
0
9
0
@Xinyu2ML
Xinyu Yang
11 days
RT @espn: Adam Silver and 7'1" Yang Hansen, who was taken 16th overall in the NBA draft 🌟
Tweet media one
0
733
0
@Xinyu2ML
Xinyu Yang
11 days
Please check Tilde's post for more information on sparse attention (Also very happy to see some of our NSA kernel in FLA being deployed in their implementation 😀.
@tilderesearch
Tilde
11 days
Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work…until now. 🔍 We reverse-engineered them to uncover:. - Novel attention patterns.- Hidden "attention sinks".- Better performance.- And more. A 🧵… ~1/8~
0
0
9
@Xinyu2ML
Xinyu Yang
11 days
RT @tilderesearch: Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work….
0
80
0
@Xinyu2ML
Xinyu Yang
11 days
RT @JiaZhihao: 📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at We….
0
30
0
@Xinyu2ML
Xinyu Yang
12 days
RT @SonglinYang4: Recordings:
0
12
0