Jiawei Zhao Profile
Jiawei Zhao

@jiawzhao

Followers
3K
Following
219
Media
8
Statuses
99

Research Scientist at Meta FAIR @AIatMeta, PhD @Caltech, GaLore, DeepConf

Joined February 2013
Don't wanna be here? Send us removal request.
@jiawzhao
Jiawei Zhao
4 months
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
63
330
2K
@zhu_hanqin41424
Hanqing Zhu
30 days
🚨 New Work! 🤔 Is RL black-box weight tinkering? 😉 No. We provably show RLVR follows a 🧭 — always updating the same off-principal regions while preserving the model's core spectra. ⚠️ Different optimization regime than SFT — SFT-era PEFT tricks can misfire(like PiSSA, the
8
41
257
@jiawzhao
Jiawei Zhao
2 months
We’ve always assumed stale and off-policy data hurts RL a lot — but our latest work shows the opposite. 🧠 M2PO (Second-Moment Trust Policy Optimization) reveals that even data stale by 256 model updates can train LLMs as effectively as on-policy RL, unlocking scalable and
Tweet card summary image
m2po.notion.site
Haizhong Zheng, Jiawei Zhao, Beidi Chen
@InfiniAILab
Infini-AI-Lab
2 months
🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
3
23
132
@__JohnNguyen__
John Nguyen
2 months
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow
7
81
410
@jiawzhao
Jiawei Zhao
3 months
Truly congrats to my friend @jxbz! He was my first mentor, guiding me into the field of optimization back in undergrad, and I’ve seen him stay true to his vision of uncovering the laws of neural networks through all the challenges and breakthroughs. Rethinking how we approach
@jxbz
Jeremy Bernstein
3 months
I wrote this blog post that tries to go further toward design principles for neural nets and optimizers The post presents a visual intro to optimization on normed manifolds and a Muon variant for the manifold of matrices with unit condition number https://t.co/EhhKN2Jylx
0
0
9
@jiawzhao
Jiawei Zhao
3 months
I’ll be giving a guest lecture at Princeton tomorrow (Thursday, 9/25), sharing our recent works on LLM Reasoning and Efficiency. Time and location below: 2:55–4:15pm at CS Building 402 Thanks to @liuzhuang1234 for organizing this!
2
1
28
@zechunliu
Zechun Liu
3 months
Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,
@_akhaliq
AK
3 months
Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the
6
16
119
@ChengLuo_lc
Cheng Luo
3 months
We need more reviewers for the 1s Workshop on Efficient Reasoning(ER) at @NeurIPSConf, if you are interested, please fill out the nomination form
Tweet card summary image
docs.google.com
We strive to expand our reviewing pool by welcoming newer members of the community. We encourage nominations from senior community members as well as self-nominations from individuals who have either...
@ChengLuo_lc
Cheng Luo
5 months
🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics
0
5
15
@jiawzhao
Jiawei Zhao
4 months
Want to try DeepConf NOW? While our full repo is coming, we just dropped a ready-to-run example in our vLLM (@vllm_project ) PR: https://t.co/TtlxtKlGfK DeepConf + DeepSeek-R1-8B + BRUMO25 = • 93.3% accuracy (+2.5% boost) • 52.9% fewer tokens generated • 31% faster
Tweet card summary image
github.com
Purpose Implement group confidence based early stopping in vLLM decoding to reduce tokens and latency while preserving or improving answer quality, following the method described in Deep Think with...
@jiawzhao
Jiawei Zhao
4 months
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
2
17
125
@jiawzhao
Jiawei Zhao
4 months
Thanks @vllm_project folks for pushing this. Please give it a try and let us know!
@vllm_project
vLLM
4 months
Wow glad to see vLLM powers @jiawzhao 's DeepConf work, impressive results on AIME 2025! Do you think this sampling control makes sense? Have a try and leave a comment in that PR https://t.co/eyvv24RTym to let us know!
1
7
133
@jiawzhao
Jiawei Zhao
4 months
⏰ Submission deadline coming up fast! (Sep 1) Working on efficient reasoning? Don’t miss the chance to share it at NeurIPS 2025!
@ChengLuo_lc
Cheng Luo
4 months
🌟 Reminder: Submission Deadline Approaching! 🌟 The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there! 📌 Submission Deadline: September 1, 2025 (AoE) 🔗 Submit here:
0
1
6
@tydsh
Yuandong Tian
4 months
We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models
@jiawzhao
Jiawei Zhao
4 months
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
11
49
367
@jiawzhao
Jiawei Zhao
4 months
Thank you for sharing it! @_akhaliq https://t.co/D3PQmpfGDI
@_akhaliq
AK
4 months
Deep Think with Confidence
0
0
5
@jiawzhao
Jiawei Zhao
4 months
🛠️ Super easy to implement! Just ~50 lines of code in vLLM. Plug & play - works with ANY existing model. Zero training or hyperparameter tuning needed. Code example: https://t.co/cqn5Anf88P Live vLLM PR: https://t.co/eW9qCTGSm2 💻 Open-source code coming soon!
Tweet card summary image
github.com
Purpose Implement group confidence based early stopping in vLLM decoding to reduce tokens and latency while preserving or improving answer quality, following the method described in Deep Think with...
2
6
167
@jiawzhao
Jiawei Zhao
4 months
📈 Online results: 33-85% token savings across all benchmarks! AIME 2025: 97.9% accuracy with 85% fewer tokens (GPT-OSS-120B). Works across 8B to 120B models - real-time efficiency without sacrificing quality. Chart shows AIME 25 results.
2
3
94
@jiawzhao
Jiawei Zhao
4 months
🏆 Offline results: 99.9% accuracy on AIME 2025 (vs 97% baseline)! Universal gains across 5 models × 5 datasets. Consistent ~10% accuracy boost across all settings. Check Table 1 in the paper for more details.
1
2
119
@jiawzhao
Jiawei Zhao
4 months
DeepConf works in 2 modes: Offline: Filter completed reasoning traces by confidence, then weight votes by quality Online: Stop generating when confidence drops below threshold in real-time
2
3
120
@jiawzhao
Jiawei Zhao
4 months
💡 The secret? LLMs already know when they're uncertain - we just weren't listening! Previous methods use confidence/entropy AFTER full generation for test-time and RL. We're different - we capture reasoning errors DURING generation. DeepConf monitors "local confidence" in
5
19
284
@jiawzhao
Jiawei Zhao
4 months
Excited to see Logarithmic format (LNS, UE8M0 FP8) used in production by @deepseek_ai! LNS enables efficient multi (just addition between exponents) + great dynamic range. Our LNS-Madam optimizer, built for LNS, was proposed years ago before LLM-era - hope it shines again!
@AnimaAnandkumar
Prof. Anima Anandkumar
4 months
It is interesting that the new @deepseek_ai v3.1 is trained using the UE8M0 FP8 scale data format which is logarithmic number system. Our multiplicative weights update (Madam) for training in that format was done several years ago while at @nvidia It yields maximum hardware
0
5
37