Xiuyu Li Profile
Xiuyu Li

@xiuyu_l

Followers
2K
Following
2K
Media
47
Statuses
287

Efficiently scaling agents. CS PhD student @berkeley_ai. Prev @NVIDIA @AIatMeta @Cornell.

Bay Area
Joined August 2017
Don't wanna be here? Send us removal request.
@xiuyu_l
Xiuyu Li
7 months
Scale smarter, not harder! Long CoT reasoning is powerful, but its sequential nature limits how efficiently and easily it can scale We incentivize LMs to divide and conquer subtasks in parallel, selectively gathering only the highest-quality explorations
@jiayi_pirate
Jiayi Pan
7 months
We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning APR lets LMs learn to orchestrate both serial & parallel compute E2E via supervised training + RL — w/ better efficiency and scalability than long CoT on Countdown 🧵  https://t.co/BKLhZ4fHEt
3
22
90
@jyangballin
John Yang
2 days
New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
24
86
340
@samir_khaki
Samir Khaki
15 days
Visual reasoning isn’t just seeing — it’s about efficient retrieval across thousands of tokens and multiple turns of conversation. Meet SparseVILA (ICCV 2025) ⚡ Highlights: 🧩 Framework: Decoupled sparsity — query-agnostic prefill, query-aware decoding. 🧠 Speed: 4.0x faster
1
2
4
@ZitongYang0
Zitong Yang
20 days
The passing of the physicist Chen-Ning Yang ( https://t.co/LOY46RpBhz) saddens me. He has been a long-time hero and role model for me. Below is a short essay I wrote yesterday about Yang that I shared with many of my friends. I translated it into English using Gemini: ``` The
10
65
417
@yukangchen_
Yukang Chen
24 days
We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show
11
68
352
@xiuyu_l
Xiuyu Li
29 days
Thrilled to see our APR paper featured in the State of AI Report 2025! The future of agentic AI lies in parallel reasoning—scaling beyond single-threaded thought to handle truly long-horizon challenges.
@nathanbenaich
Nathan Benaich
29 days
🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
3
6
42
@Chenfeng_X
Chenfeng_X
1 month
🥳We’re releasing StreamDiffusionV2 for the live-stream community—from individual creators with one GPU to enterprise platforms with many. StreamDiffusionV2 is our follow-up to StreamDiffusion: #StreamDiffusion powered real products, but temporal consistency still bugged us.
12
45
223
@Agentica_
Agentica Project
1 month
Introducing Pepper🌶️! An open-source, real-time, event-driven architecture to power the next generation of proactive agents. Tired of static reactive chatbots? Pepper enables agents that anticipate your needs, actively engage, and work continuously in the background (think
11
61
483
@Xinyu2ML
Xinyu Yang
1 month
These days, LoRA seems less prominent in mainstream discussions compared to full FT. However, the post from @thinkymachines highlights that LoRA can actually match full FT in real-world customization scenarios! One year ago, one of my previous works ( https://t.co/yWjsYdG3xZ)
@thinkymachines
Thinking Machines
1 month
LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.
5
25
169
@yukangchen_
Yukang Chen
1 month
🚀 We open-sourced LongLive — interactive, real-time long-video generation. 👥Generates video in real time as users enter text prompts. ⚡️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step
4
18
79
@Chenfeng_X
Chenfeng_X
1 month
Happy to share that we have two papers got accepted by @NeurIPSConf 2025 as #Spotlight papers! 1. 👼Angles Don’t Lie: Unlocking Training-Efficient RL from a Model’s Own Signals TL;DR: Token angles—the model’s self-generated signals—can reveal how well it grasps the data. By
18
33
308
@HaochengXiUCB
Haocheng Xi
1 month
🚀 Introducing Sparse VideoGen2 (SVG2) — Pareto-frontier video generation acceleration with semantic-aware sparse attention! 🏆Spotlight paper accepted by #NeurIPS2025 ✅ Training-free & plug-and-play ✅ Up to 2.5× faster on HunyuanVideo, 1.9× faster on Wan 2.1 ✅ SOTA quality
16
59
261
@Guangxuan_Xiao
Guangxuan Xiao
3 months
Just wrote a post on my understanding of the statistics behind block sparse attention. My take is that it works by using the "learned similarity gap," which creates a simple SNR formula connecting retrieval quality with model architecture. Read more:
Tweet card summary image
guangxuanx.com
How can a language model comprehend a million-token document without drowning in O(N²) attention cost? A statistical model revealing the success of block sparse attention through learned similarity...
5
48
387
@xiuyu_l
Xiuyu Li
3 months
By using a Mamba projector for spatio-temporal fusion and pooling in VLM training, we achieve 8× token compression for long video understanding with SoTA performance. Last year, I spent quite some time on context compression, and one key lesson was clear: when compression is
@wonmin_byeon
Wonmin Byeon
3 months
🚀 New paper: STORM — Efficient VLM for Long Video Understanding STORM cuts compute costs by up to 8× and reduces decoding latency by 2.4–2.9×, while achieving state-of-the-art performance. Details + paper link in the thread ↓
0
0
9
@xiuyu_l
Xiuyu Li
3 months
And GPT-5 is a good model
1
0
3
@xiuyu_l
Xiuyu Li
3 months
I’ve seen people worry that LLMs have hit a wall after GPT-5’s release. I think that’s the wrong mindset. You can’t believe in AGI only when OpenAI delivers a miracle. The journey is longer than the hype cycles.
1
0
14
@Guangxuan_Xiao
Guangxuan Xiao
3 months
I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models. For those interested in the details: https://t.co/0EAi2KQMMx
39
284
2K
@xiuyu_l
Xiuyu Li
3 months
Unrelated but curious note when watching today’s GPT-5 livestream: the generated subtitles always lagged 3–5 seconds behind the audio. The year is 2025, and we still don't have universal real-time ASR deployed in the cloud. Worth pondering.
0
0
4
@hanrui_w
Ryan Hanrui Wang
3 months
Announcing Eigen AI @Eigen_AI_Labs, the world’s first company dedicated to AEI — Artificial Efficient Intelligence. 🚀 The future of AI is already here; it’s simply not evenly distributed. Our mission is to close that gap by driving radical efficiency so that every person and
lnkd.in
This link will take you to a page that’s not on LinkedIn
@Eigen_AI_Labs
Eigen AI
3 months
🚀Founded by four dedicated MIT graduates, Eigen AI is the world's first company focusing on AEI – Artificial Efficient Intelligence, making AI accessible for all. Today OpenAI dropped GPT-OSS. We teamed up with our partners SGLang @lmsysorg and @NVIDIA to deliver open-source
1
13
59
@baifeng_shi
Baifeng
3 months
We just dropped a few new PS3 models, with SOTA performance compared to existing vision encoders such as SigLIP2, C-RADIOv2, AIMv2, InternViT2.5, and Perception Encoder! Coming along with several new VILA-HD models. Check it out👇 Models: https://t.co/UwjpBWpFBj Code:
4
16
85
@Chenfeng_X
Chenfeng_X
3 months
📢 Excited to sharing a little late update (before it is no longer news): I’ll be joining @UTAustin @UTCompSci as an Assistant Professor! I'm recruiting PhD students from @UTCompSci in the Fall 2025 cycle and also looking for RAs/interns! More info see https://t.co/JPDhVplhJX
31
31
406