Weizhu Chen @WeizhuChen X Profile

Weizhu Chen

@WeizhuChen

Followers

3K

Following

585

Media

22

Statuses

182

Microsoft

https://t.co/ayhRaDSNnU

Kirkland WA

Joined April 2008

Don't wanna be here? Send us removal request.

Weizhu Chen

@WeizhuChen

6 days

If you were impacted by Meta’s recent layoffs or have deep expertise in model training and data, here: https://t.co/zOnk8zpKx5

2

14

73

Weizhu Chen

@WeizhuChen

9 days

Happy to join the first in-person PyTorch foundation board meeting today. There is a lot to learn about how to run such a big foundation as a board member. PyTorch is growing everywhere in the AI community. I look forward to hearing from the community about what you’d like to

0

25

Weizhu Chen

@WeizhuChen

10 days

Prof. Yang said that his greatest accomplishment was not winning the Nobel Prize, but helping to change the deep-rooted belief among Chinese people that they were somehow less capable than others at that time. For that, we all owe him a gratitude. RIP to our giant.

Tsinghua University

@Tsinghua_Uni

12 days

Prof. Chen Ning Yang, a world-renowned physicist, Nobel Laureate in Physics, Academician of the Chinese Academy of Sciences, Professor at Tsinghua University, and Honorary Director of the Institute for Advanced Study at Tsinghua University, passed away in Beijing due to illness

0

12

Weizhu Chen

@WeizhuChen

23 days

Love to see this. Will LoRA become even more popular in RL?

John Schulman

@johnschulman2

24 days

Really happy to see people reproducing the result that LoRA rank=1 closely matches full fine-tuning on many RL fine-tuning problems. Here are a couple nice ones: https://t.co/x7hcgNL3Bd https://t.co/5JyKuKd9wS

0

9

Kaiyu Yang

@KaiyuYang4

3 months

🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California

8

57

238

Weizhu Chen

@WeizhuChen

3 months

See our work in the workshop today. If you are looking for opportunities to work on efficient model architecture or whatever to make the training or inference run much faster with thousands or more gpus, please come to talk to us or dm me. We are hiring.

Liliang Ren

@liliang_ren

3 months

We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 https://t.co/Nxsm6FclOX (1/4)

3

1

29

Weizhu Chen

@WeizhuChen

3 months

Sorry. Just change the setting of the DM to make it open.

0

2

Weizhu Chen

@WeizhuChen

3 months

Just arrived at ICML. Please drop me a message if you are here and like to chat. We are hiring.

8

9

183

Weizhu Chen

@WeizhuChen

4 months

You may check our work of Phi4-mini-flash-Reasoning. What I like the most is the Gated Memory Unit (GMU) design, which can be applied in future model design to achieve quality and long context, as well as the uP++. @liliang_ren

Liliang Ren

@liliang_ren

4 months

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput

1

0

19

Weizhu Chen

@WeizhuChen

5 months

Synthesizing challenging problems that current model performs poorly is an important area in RL. Another thing interests me is the self-evolve learning via synthesizing questions/problems that the model can learn continuously. You may check our work here: https://t.co/eaCTRjSwFz

2

3

27

Weizhu Chen

@WeizhuChen

6 months

Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc

2

4

28

Weizhu Chen

@WeizhuChen

6 months

Happy to see @ypwang61 and the team did some interesting work here.

Yiping Wang

@ypwang61

6 months

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: https://t.co/D65XR9mMs2

0

1

5

Weizhu Chen

@WeizhuChen

8 months

Check out our tech report of the phi4 mini and multimodality.

AK

@_akhaliq

8 months

Phi-4-Mini Technical Report Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

2

9

46

Weizhu Chen

@WeizhuChen

8 months

We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also

48

145

730

Weizhu Chen

@WeizhuChen

11 months

+1 on this.

0

9

Jeff Dean

@JeffDean

11 months

@moderncpp7 @clu_cheng @NeurIPSConf @drfeifei @jhyuxm @edchi I didn't see the talk, but the images I've seen of the slide seem quite offensive. Such generalizations should have no place in NeurIPS or anywhere else.

34

158

1K

Weizhu Chen

@WeizhuChen

11 months

A big shout out to our amazing interns @zebgou and @Zhenghaolin1 , my colleagues @yynlpanda @XiaoLiuNLP @ShenYelong. They did most of the work in this paper.

0

2