Weizhu Chen Profile
Weizhu Chen

@WeizhuChen

Followers
3K
Following
585
Media
22
Statuses
182

Microsoft

Kirkland WA
Joined April 2008
Don't wanna be here? Send us removal request.
@WeizhuChen
Weizhu Chen
6 days
If you were impacted by Meta’s recent layoffs or have deep expertise in model training and data, here: https://t.co/zOnk8zpKx5
2
14
73
@WeizhuChen
Weizhu Chen
9 days
Happy to join the first in-person PyTorch foundation board meeting today. There is a lot to learn about how to run such a big foundation as a board member. PyTorch is growing everywhere in the AI community. I look forward to hearing from the community about what you’d like to
0
0
25
@WeizhuChen
Weizhu Chen
10 days
Prof. Yang said that his greatest accomplishment was not winning the Nobel Prize, but helping to change the deep-rooted belief among Chinese people that they were somehow less capable than others at that time. For that, we all owe him a gratitude. RIP to our giant.
@Tsinghua_Uni
Tsinghua University
12 days
Prof. Chen Ning Yang, a world-renowned physicist, Nobel Laureate in Physics, Academician of the Chinese Academy of Sciences, Professor at Tsinghua University, and Honorary Director of the Institute for Advanced Study at Tsinghua University, passed away in Beijing due to illness
0
0
12
@WeizhuChen
Weizhu Chen
23 days
Love to see this. Will LoRA become even more popular in RL?
@johnschulman2
John Schulman
24 days
Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: https://t.co/x7hcgNL3Bd https://t.co/5JyKuKd9wS
0
0
9
@KaiyuYang4
Kaiyu Yang
3 months
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
8
57
238
@WeizhuChen
Weizhu Chen
3 months
See our work in the workshop today. If you are looking for opportunities to work on efficient model architecture or whatever to make the training or inference run much faster with thousands or more gpus, please come to talk to us or dm me. We are hiring.
@liliang_ren
Liliang Ren
3 months
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 https://t.co/Nxsm6FclOX (1/4)
3
1
29
@WeizhuChen
Weizhu Chen
3 months
Sorry. Just change the setting of the DM to make it open.
0
0
2
@WeizhuChen
Weizhu Chen
3 months
Just arrived at ICML. Please drop me a message if you are here and like to chat. We are hiring.
8
9
183
@WeizhuChen
Weizhu Chen
4 months
You may check our work of Phi4-mini-flash-Reasoning. What I like the most is the Gated Memory Unit (GMU) design, which can be applied in future model design to achieve quality and long context, as well as the uP++. @liliang_ren
@liliang_ren
Liliang Ren
4 months
Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput
1
0
19
@WeizhuChen
Weizhu Chen
5 months
Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. You may check our work here: https://t.co/eaCTRjSwFz
2
3
27
@WeizhuChen
Weizhu Chen
6 months
Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc
2
4
28
@WeizhuChen
Weizhu Chen
6 months
Happy to see @ypwang61 and the team did some interesting work here.
@ypwang61
Yiping Wang
6 months
We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: https://t.co/D65XR9mMs2
0
1
5
@WeizhuChen
Weizhu Chen
8 months
Check out our tech report of the phi4 mini and multimodality.
@_akhaliq
AK
8 months
Phi-4-Mini Technical Report Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
2
9
46
@WeizhuChen
Weizhu Chen
8 months
We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also
48
145
730
@WeizhuChen
Weizhu Chen
11 months
+1 on this.
0
0
9
@JeffDean
Jeff Dean
11 months
@moderncpp7 @clu_cheng @NeurIPSConf @drfeifei @jhyuxm @edchi I didn't see the talk, but the images I've seen of the slide seem quite offensive. Such generalizations should have no place in NeurIPS or anywhere else.
34
158
1K
@WeizhuChen
Weizhu Chen
11 months
A big shout out to our amazing interns @zebgou and @Zhenghaolin1 , my colleagues @yynlpanda @XiaoLiuNLP @ShenYelong. They did most of the work in this paper.
0
0
2