Jiawei Zhao
@jiawzhao
Followers
3K
Following
219
Media
8
Statuses
99
Research Scientist at Meta FAIR @AIatMeta, PhD @Caltech, GaLore, DeepConf
Joined February 2013
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
63
330
2K
🚨 New Work! 🤔 Is RL black-box weight tinkering? 😉 No. We provably show RLVR follows a 🧭 — always updating the same off-principal regions while preserving the model's core spectra. ⚠️ Different optimization regime than SFT — SFT-era PEFT tricks can misfire(like PiSSA, the
8
41
257
We’ve always assumed stale and off-policy data hurts RL a lot — but our latest work shows the opposite. 🧠 M2PO (Second-Moment Trust Policy Optimization) reveals that even data stale by 256 model updates can train LLMs as effectively as on-policy RL, unlocking scalable and
m2po.notion.site
Haizhong Zheng, Jiawei Zhao, Beidi Chen
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
3
23
132
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow
7
81
410
Truly congrats to my friend @jxbz! He was my first mentor, guiding me into the field of optimization back in undergrad, and I’ve seen him stay true to his vision of uncovering the laws of neural networks through all the challenges and breakthroughs. Rethinking how we approach
I wrote this blog post that tries to go further toward design principles for neural nets and optimizers The post presents a visual intro to optimization on normed manifolds and a Muon variant for the manifold of matrices with unit condition number https://t.co/EhhKN2Jylx
0
0
9
I’ll be giving a guest lecture at Princeton tomorrow (Thursday, 9/25), sharing our recent works on LLM Reasoning and Efficiency. Time and location below: 2:55–4:15pm at CS Building 402 Thanks to @liuzhuang1234 for organizing this!
2
1
28
Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,
Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the
6
16
119
We need more reviewers for the 1s Workshop on Efficient Reasoning(ER) at @NeurIPSConf, if you are interested, please fill out the nomination form
docs.google.com
We strive to expand our reviewing pool by welcoming newer members of the community. We encourage nominations from senior community members as well as self-nominations from individuals who have either...
🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics
0
5
15
Want to try DeepConf NOW? While our full repo is coming, we just dropped a ready-to-run example in our vLLM (@vllm_project ) PR: https://t.co/TtlxtKlGfK DeepConf + DeepSeek-R1-8B + BRUMO25 = • 93.3% accuracy (+2.5% boost) • 52.9% fewer tokens generated • 31% faster
github.com
Purpose Implement group confidence based early stopping in vLLM decoding to reduce tokens and latency while preserving or improving answer quality, following the method described in Deep Think with...
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
2
17
125
Thanks @vllm_project folks for pushing this. Please give it a try and let us know!
Wow glad to see vLLM powers @jiawzhao 's DeepConf work, impressive results on AIME 2025! Do you think this sampling control makes sense? Have a try and leave a comment in that PR https://t.co/eyvv24RTym to let us know!
1
7
133
⏰ Submission deadline coming up fast! (Sep 1) Working on efficient reasoning? Don’t miss the chance to share it at NeurIPS 2025!
🌟 Reminder: Submission Deadline Approaching! 🌟 The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there! 📌 Submission Deadline: September 1, 2025 (AoE) 🔗 Submit here:
0
1
6
We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
11
49
367
Thank you for sharing it! @_akhaliq
https://t.co/D3PQmpfGDI
0
0
5
🛠️ Super easy to implement! Just ~50 lines of code in vLLM. Plug & play - works with ANY existing model. Zero training or hyperparameter tuning needed. Code example: https://t.co/cqn5Anf88P Live vLLM PR: https://t.co/eW9qCTGSm2 💻 Open-source code coming soon!
github.com
Purpose Implement group confidence based early stopping in vLLM decoding to reduce tokens and latency while preserving or improving answer quality, following the method described in Deep Think with...
2
6
167
📈 Online results: 33-85% token savings across all benchmarks! AIME 2025: 97.9% accuracy with 85% fewer tokens (GPT-OSS-120B). Works across 8B to 120B models - real-time efficiency without sacrificing quality. Chart shows AIME 25 results.
2
3
94
🏆 Offline results: 99.9% accuracy on AIME 2025 (vs 97% baseline)! Universal gains across 5 models × 5 datasets. Consistent ~10% accuracy boost across all settings. Check Table 1 in the paper for more details.
1
2
119
DeepConf works in 2 modes: Offline: Filter completed reasoning traces by confidence, then weight votes by quality Online: Stop generating when confidence drops below threshold in real-time
2
3
120
💡 The secret? LLMs already know when they're uncertain - we just weren't listening! Previous methods use confidence/entropy AFTER full generation for test-time and RL. We're different - we capture reasoning errors DURING generation. DeepConf monitors "local confidence" in
5
19
284
Excited to see Logarithmic format (LNS, UE8M0 FP8) used in production by @deepseek_ai! LNS enables efficient multi (just addition between exponents) + great dynamic range. Our LNS-Madam optimizer, built for LNS, was proposed years ago before LLM-era - hope it shines again!
It is interesting that the new @deepseek_ai v3.1 is trained using the UE8M0 FP8 scale data format which is logarithmic number system. Our multiplicative weights update (Madam) for training in that format was done several years ago while at @nvidia It yields maximum hardware
0
5
37