Xinyu Yang @Xinyu2ML X Profile

Xinyu Yang

@Xinyu2ML

Followers

760

Following

412

Media

26

Statuses

239

Ph.D. @CarnegieMellon. Working on data and hardware-driven principled algorithm & system co-design for scalable and generalizable foundation models. They/Them

Pittsburgh, US

Joined December 2022

Don't wanna be here? Send us removal request.

Xinyu Yang

@Xinyu2ML

20 days

🚀 Super excited to share Multiverse!. 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level.

Infini-AI-Lab

@InfiniAILab

20 days

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46%. 🌐 Website: 🧵 1/n

3

18

58

Xinyu Yang

@Xinyu2ML

3 days

RT @ZeyuanAllenZhu: Facebook AI Research (FAIR) is a small, prestigious lab in Meta. We don't train large models like GenAI or MSL, so it's….

0

59

0

Xinyu Yang

@Xinyu2ML

4 days

RT @sansa19739319: 🤖Can diffusion models write code competitively?.Excited to share our latest 7B coding diffusion LLM!!💻. With DiffuCoder,….

0

107

0

Xinyu Yang

@Xinyu2ML

5 days

RT @fengyao1909: 😵‍💫 Struggling with 𝐟𝐢𝐧𝐞-𝐭𝐮𝐧𝐢𝐧𝐠 𝐌𝐨𝐄?. Meet 𝐃𝐞𝐧𝐬𝐞𝐌𝐢𝐱𝐞𝐫 — an MoE post-training method that offers more 𝐩𝐫𝐞𝐜𝐢𝐬𝐞 𝐫𝐨𝐮𝐭𝐞𝐫 𝐠𝐫𝐚𝐝𝐢𝐞….

0

53

0

Xinyu Yang

@Xinyu2ML

5 days

RT @SonglinYang4:

0

10

0

Xinyu Yang

@Xinyu2ML

5 days

RT @gan_chuang: 🧠 LLMs think too much—and waste tokens! Can we precisely control how long they reason?. Introducing Budget Guidance — a th….

0

18

0

Xinyu Yang

@Xinyu2ML

5 days

RT @chelseabfinn: We still lack a scalable recipe for RL post-training seeded with demonstration data. Many methods add an imitation loss,….

0

36

0

Xinyu Yang

@Xinyu2ML

5 days

RT @yifeiwang77: 🔥Thrilled to share that our sparse embedding method CSR (ICML’25 oral) is now officially supported in SentenceTransformers….

0

1

0

Xinyu Yang

@Xinyu2ML

6 days

RT @xiuyu_l: Sparsity can make your LoRA fine-tuning go brrr 💨. Announcing SparseLoRA (ICML 2025): up to 1.6-1.9x faster LLM fine-tuning (2….

0

57

0

Xinyu Yang

@Xinyu2ML

9 days

RT @xichen_pan: The code and instruction-tuning data for MetaQuery are now open-sourced!.Code: Data: .

0

22

0

Xinyu Yang

@Xinyu2ML

9 days

RT @_hanlin_zhang_: [1/n] New work [JSKZ25] w/ @JikaiJin2002, @syrgkanis, @ShamKakade6. We introduce new formulations and tools for evalu….

0

11

0

Xinyu Yang

@Xinyu2ML

10 days

RT @Huangyu58589918: What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of tra….

0

17

0

Xinyu Yang

@Xinyu2ML

10 days

RT @NovaSkyAI: ✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easi….

0

43

0

Xinyu Yang

@Xinyu2ML

10 days

RT @RunjiaLi: 🎉 VMem is officially accepted to ICCV 2025!.Excited to chat with everyone in Hawaii about making video generation consistent….

0

7

0

Xinyu Yang

@Xinyu2ML

11 days

vote for Allen-Zhu :).

Aran Komatsuzaki

@arankomatsuzaki

17 days

I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giving them all the budget and power.

0

12

Xinyu Yang

@Xinyu2ML

11 days

RT @arankomatsuzaki: I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giv….

0

9

0

Xinyu Yang

@Xinyu2ML

11 days

RT @espn: Adam Silver and 7'1" Yang Hansen, who was taken 16th overall in the NBA draft 🌟

0

733

0

Xinyu Yang

@Xinyu2ML

11 days

Please check Tilde's post for more information on sparse attention (Also very happy to see some of our NSA kernel in FLA being deployed in their implementation 😀.

Tilde

@tilderesearch

11 days

Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work…until now. 🔍 We reverse-engineered them to uncover:. - Novel attention patterns.- Hidden "attention sinks".- Better performance.- And more. A 🧵… ~1/8~

0

9

Xinyu Yang

@Xinyu2ML

11 days

RT @tilderesearch: Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work….

0

80

0

Xinyu Yang

@Xinyu2ML

11 days

RT @JiaZhihao: 📢Exciting updates from #MLSys2025! All session recordings are now available and free to watch at We….

0

30

0

Xinyu Yang

@Xinyu2ML

12 days

RT @SonglinYang4: Recordings:

0

12

0