Weizhu Chen
@WeizhuChen
Followers
3K
Following
585
Media
22
Statuses
182
Microsoft
Kirkland WA
Joined April 2008
If you were impacted by Meta’s recent layoffs or have deep expertise in model training and data, here: https://t.co/zOnk8zpKx5
2
14
73
Happy to join the first in-person PyTorch foundation board meeting today. There is a lot to learn about how to run such a big foundation as a board member. PyTorch is growing everywhere in the AI community. I look forward to hearing from the community about what you’d like to
0
0
25
Prof. Yang said that his greatest accomplishment was not winning the Nobel Prize, but helping to change the deep-rooted belief among Chinese people that they were somehow less capable than others at that time. For that, we all owe him a gratitude. RIP to our giant.
Prof. Chen Ning Yang, a world-renowned physicist, Nobel Laureate in Physics, Academician of the Chinese Academy of Sciences, Professor at Tsinghua University, and Honorary Director of the Institute for Advanced Study at Tsinghua University, passed away in Beijing due to illness
0
0
12
Love to see this. Will LoRA become even more popular in RL?
Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: https://t.co/x7hcgNL3Bd
https://t.co/5JyKuKd9wS
0
0
9
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
8
57
238
See our work in the workshop today. If you are looking for opportunities to work on efficient model architecture or whatever to make the training or inference run much faster with thousands or more gpus, please come to talk to us or dm me. We are hiring.
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 https://t.co/Nxsm6FclOX (1/4)
3
1
29
Just arrived at ICML. Please drop me a message if you are here and like to chat. We are hiring.
8
9
183
You may check our work of Phi4-mini-flash-Reasoning. What I like the most is the Gated Memory Unit (GMU) design, which can be applied in future model design to achieve quality and long context, as well as the uP++. @liliang_ren
Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput
1
0
19
Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. You may check our work here: https://t.co/eaCTRjSwFz
2
3
27
Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc
2
4
28
Happy to see @ypwang61 and the team did some interesting work here.
We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: https://t.co/D65XR9mMs2
0
1
5
We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also
48
145
730
@moderncpp7 @clu_cheng @NeurIPSConf @drfeifei @jhyuxm @edchi I didn't see the talk, but the images I've seen of the slide seem quite offensive. Such generalizations should have no place in NeurIPS or anywhere else.
34
158
1K
A big shout out to our amazing interns @zebgou and @Zhenghaolin1 , my colleagues @yynlpanda @XiaoLiuNLP @ShenYelong. They did most of the work in this paper.
0
0
2