Xinyu Yang
@Xinyu2ML
Followers
1K
Following
1K
Media
32
Statuses
412
Ph.D. @CarnegieMellon. Working on agentic foundation model systems. Founder of the FM-Wild workshop series and the ASAP seminar series. They/Them
Pittsburgh, US
Joined December 2022
🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n
3
19
72
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
2
9
21
🏗️ Hardware Memory bandwidth is becoming the choke point slowing down GenAI. During 2018–2022, transformer model size grew ~410× every 2 years, while memory per accelerator grew only about 2× every 2 years. And that mismatch shoves us into a “Memory-Wall” The "memory wall" is
31
108
495
A rare chance for a tech person to learn art, haha. Honored to join this panel at #AMIF2025. 🎶🤖
Excited to announce our upcoming panel — “AI + Music: Empowering, Not Overpowering” at the Asian Music Industry Festival (AMIF) in Boston, Nov 15–16. AI academia <-+-> music industry - how AI can elevate creativity without replacing human artistry. We’re honored to feature: -
2
0
6
We present MotionStream — real-time, long-duration video generation that you can interactively control just by dragging your mouse. All videos here are raw, real-time screen captures without any post-processing. Model runs on a single H100 at 29 FPS and 0.4s latency.
37
149
1K
How powerful are Diffusion LLMs? Can they solve problems that Auto-Regressive (AR) LLMs can’t solve? Check our new paper "On Powerful Ways to Generate: Autoregression, Diffusion, and Beyond" 🔗 https://t.co/aiGTbXMWFE In this work, we show while Diffusion LLMs are indeed more
16
77
371
Thrilled to release new paper: “Scaling Latent Reasoning via Looped Language Models.” TLDR: We scale up loop language models to 2.6 billion parameters, and pretrained on > 7 trillion tokens. The resulting model is on par with SOTA language models of 2 to 3x size.
20
137
627
We just built and released the largest dataset for supervised fine-tuning of agentic LMs, 1.27M trajectories (~36B tokens)! Up until now, large-scale SFT for agents is rare - not for lack of data, but because of fragmentation across heterogeneous formats, tools, and interfaces.
arxiv.org
Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue...
27
172
1K
Many people are confused by Minimax’s recent return to full attention - especially since it was the first large-scale pivot toward hybrid linear attention - and by Kimi’s later adoption of hybrid linear variants (as well as earlier attempts by Qwen3-Next, or Qwen3.5). I actually
12
64
506
it’s an improved version of Gated DeltaNet. enjoy ^^
5
15
202
🚀 Happy to present our new work on LLM reasoning! We show that: (1) Attention is a structured map of the model's reasoning logic, uncovering a preplan-and-anchor reasoning rhythm. (2) Aligning RL objectives with the model's intrinsic attention rhythm yields more transparent,
9
45
229
We, at NVIDIA, presents - Length Penalty Done Right - Cut CoT length by 3/4 without sacrificing accuracy using only RL - This makes DeepSeek-R1-7B running ~8 times faster on AIME-24 while maintaining the same accuracy.
8
29
248
today i had a talk in hkust gz, one friend asked me how come we can make the bet on scaling linear attention. my answer is more about the culture that i have been trying to make. admittedly it is too hard to change the mechanism which always rewards visible contribution and
27
28
552
Wrote a 1-year retrospective with @a1zhang on KernelBench and the journey toward automated GPU/CUDA kernel generations! Since my labmates (@anneouyang, @simran_s_arora, @_williamhu) and I first started working towards this vision around last year’s @GPU_mode hackathon, we have
10
61
288
@daemonzhang6 That's the problem. People who are responsible for the issues are not the people who got laid off😅 In January, our team put down all the research we are currently doing, was (forced?) to move to GenAI <2 months before the llama 4 release deadline to help with all the
9
39
827
⚠️Humans and AIs may write the same code that passes the same unit test, but “safety” isn’t symmetric. 🧑🦱 For humans, "Correct" ≈ "Safe": With accountability, they should avoid writing warnable but passing code. 🤖 For agents: "Correct" ≠ "Safe": Without responsibility,
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
0
0
11
📣 we study a threat model that users intent to leverage llm agent to fix problems in the code base but the agent could just insert vulnerabilities in while passes all the tests — I think security would be a more and more important problem when agents ability grows. So much fun
🚀If your code agent generates a patch that passes all tests, should you trust it merge automatically? ⚠️You probably shouldn’t! “Correct” ≠ “Safe.” In our study we show that a single normal looking issue description, whether from a benign user or not, can lead code agents
0
3
30
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
474
282
7K
You can just train things
5
20
247