Jiawei Zhao @jiawzhao X Profile

Jiawei Zhao

@jiawzhao

Followers

3K

Following

219

Media

8

Statuses

99

Research Scientist at Meta FAIR @AIatMeta, PhD @Caltech, GaLore, DeepConf

https://t.co/ST0OQRqSNa

Joined February 2013

Don't wanna be here? Send us removal request.

Jiawei Zhao

@jiawzhao

4 months

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong

63

330

2K

Hanqing Zhu

@zhu_hanqin41424

30 days

🚨 New Work! 🤔 Is RL black-box weight tinkering? 😉 No. We provably show RLVR follows a 🧭 — always updating the same off-principal regions while preserving the model's core spectra. ⚠️ Different optimization regime than SFT — SFT-era PEFT tricks can misfire(like PiSSA, the

8

41

257

Jiawei Zhao

@jiawzhao

2 months

We’ve always assumed stale and off-policy data hurts RL a lot — but our latest work shows the opposite. 🧠 M2PO (Second-Moment Trust Policy Optimization) reveals that even data stale by 256 model updates can train LLMs as effectively as on-policy RL, unlocking scalable and

m2po.notion.site

Haizhong Zheng, Jiawei Zhao, Beidi Chen

Infini-AI-Lab

@InfiniAILab

2 months

🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and

3

23

132

John Nguyen

@__JohnNguyen__

2 months

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow

7

81

410

Jiawei Zhao

@jiawzhao

3 months

Truly congrats to my friend @jxbz! He was my first mentor, guiding me into the field of optimization back in undergrad, and I’ve seen him stay true to his vision of uncovering the laws of neural networks through all the challenges and breakthroughs. Rethinking how we approach

Jeremy Bernstein

@jxbz

3 months

I wrote this blog post that tries to go further toward design principles for neural nets and optimizers The post presents a visual intro to optimization on normed manifolds and a Muon variant for the manifold of matrices with unit condition number https://t.co/EhhKN2Jylx

0

9

Jiawei Zhao

@jiawzhao

3 months

I’ll be giving a guest lecture at Princeton tomorrow (Thursday, 9/25), sharing our recent works on LLM Reasoning and Efficiency. Time and location below: 2:55–4:15pm at CS Building 402 Thanks to @liuzhuang1234 for organizing this!

2

1

28

Zechun Liu

@zechunliu

3 months

Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,

AK

@_akhaliq

3 months

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the

6

16

119

Cheng Luo

@ChengLuo_lc

3 months

We need more reviewers for the 1s Workshop on Efficient Reasoning(ER) at @NeurIPSConf, if you are interested, please fill out the nomination form

docs.google.com

We strive to expand our reviewing pool by welcoming newer members of the community. We encourage nominations from senior community members as well as self-nominations from individuals who have either...

Cheng Luo

@ChengLuo_lc

5 months

🌟 Announcing the 1st Workshop on Efficient Reasoning (ER) at @NeurIPSConf 2025 — Dec 6 or 7, San Diego ! 📣 We welcome submissions! Submit your work here: https://t.co/13TumRabVh 🗓️ Deadline: September 1, 2025 (AoE) 🔗 Website: https://t.co/tcTfZ6r6lS 💬 Topics

0

5

15

Jiawei Zhao

@jiawzhao

4 months

Want to try DeepConf NOW? While our full repo is coming, we just dropped a ready-to-run example in our vLLM (@vllm_project ) PR: https://t.co/TtlxtKlGfK DeepConf + DeepSeek-R1-8B + BRUMO25 = • 93.3% accuracy (+2.5% boost) • 52.9% fewer tokens generated • 31% faster

github.com

Purpose Implement group confidence based early stopping in vLLM decoding to reduce tokens and latency while preserving or improving answer quality, following the method described in Deep Think with...

Jiawei Zhao

@jiawzhao

4 months

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong

2

17

125

Jiawei Zhao

@jiawzhao

4 months

Thanks @vllm_project folks for pushing this. Please give it a try and let us know!

vLLM

@vllm_project

4 months

Wow glad to see vLLM powers @jiawzhao 's DeepConf work, impressive results on AIME 2025! Do you think this sampling control makes sense? Have a try and leave a comment in that PR https://t.co/eyvv24RTym to let us know!

1

7

133

Jiawei Zhao

@jiawzhao

4 months

⏰ Submission deadline coming up fast! (Sep 1) Working on efficient reasoning? Don’t miss the chance to share it at NeurIPS 2025!

Cheng Luo

@ChengLuo_lc

4 months

🌟 Reminder: Submission Deadline Approaching! 🌟 The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there! 📌 Submission Deadline: September 1, 2025 (AoE) 🔗 Submit here:

0

1

6

Yuandong Tian

@tydsh

4 months

We released DeepConf that can achieve 99.9% on AIME'25 with open source models with only 15% of the compute, compared to majority voting@512. The secret? Simple. Just to pruning the rollouts if they show a consecutive stream of low-confidence😀. Can be applied to any models

Jiawei Zhao

@jiawzhao

4 months

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong

11

49

367

Jiawei Zhao

@jiawzhao

4 months

Thank you for sharing it! @_akhaliq https://t.co/D3PQmpfGDI

AK

@_akhaliq

4 months

Deep Think with Confidence

0

5

Jiawei Zhao

@jiawzhao

4 months

🛠️ Super easy to implement! Just ~50 lines of code in vLLM. Plug & play - works with ANY existing model. Zero training or hyperparameter tuning needed. Code example: https://t.co/cqn5Anf88P Live vLLM PR: https://t.co/eW9qCTGSm2 💻 Open-source code coming soon!

github.com

Purpose Implement group confidence based early stopping in vLLM decoding to reduce tokens and latency while preserving or improving answer quality, following the method described in Deep Think with...

2

6

167

Jiawei Zhao

@jiawzhao

4 months

📈 Online results: 33-85% token savings across all benchmarks! AIME 2025: 97.9% accuracy with 85% fewer tokens (GPT-OSS-120B). Works across 8B to 120B models - real-time efficiency without sacrificing quality. Chart shows AIME 25 results.

2

3

94

Jiawei Zhao

@jiawzhao

4 months

🏆 Offline results: 99.9% accuracy on AIME 2025 (vs 97% baseline)! Universal gains across 5 models × 5 datasets. Consistent ~10% accuracy boost across all settings. Check Table 1 in the paper for more details.

1

2

119

Jiawei Zhao

@jiawzhao

4 months

DeepConf works in 2 modes: Offline: Filter completed reasoning traces by confidence, then weight votes by quality Online: Stop generating when confidence drops below threshold in real-time

2

3

120

Jiawei Zhao

@jiawzhao

4 months

💡 The secret? LLMs already know when they're uncertain - we just weren't listening! Previous methods use confidence/entropy AFTER full generation for test-time and RL. We're different - we capture reasoning errors DURING generation. DeepConf monitors "local confidence" in

5

19

284

Jiawei Zhao

@jiawzhao

4 months

Excited to see Logarithmic format (LNS, UE8M0 FP8) used in production by @deepseek_ai! LNS enables efficient multi (just addition between exponents) + great dynamic range. Our LNS-Madam optimizer, built for LNS, was proposed years ago before LLM-era - hope it shines again!

Prof. Anima Anandkumar

@AnimaAnandkumar

4 months

It is interesting that the new @deepseek_ai v3.1 is trained using the UE8M0 FP8 scale data format which is logarithmic number system. Our multiplicative weights update (Madam) for training in that format was done several years ago while at @nvidia It yields maximum hardware

0

5

37