Mojan Javaheripi
@mojan_jp
Followers
358
Following
79
Media
5
Statuses
42
Phi models training team @MSFTResearch. CE PhD from @UCSanDiego
Joined November 2019
Great to see the additive dataset methodology we proposed in Phi-4-reasoning adopted in open-r1. Tldr: optimize data mixture per reasoning domain, and combine in final run for generalized performance. This is a game changer for reducing data ablation costs.
Happy to share 💭 Mixture of Thoughts 💭 A curated, general reasoning dataset that trims down over 1M samples from public datasets to ~350k through an extensive set of ablations 🧑🍳 Models trained on this mix match or exceed the performance of DeepSeek's distilled models -- not
0
10
45
wow phi-4-reasoning with its mere 14B parameters beats deepseek-R1 and its 671B parameters (on AIME25). So data quality matters you tell me? 😁
I am thrilled to share our newest Phi models. This time we went all in on post-training to produce Phi-4-reasoning (SFT only) and Phi-4-reasoning-plus (SFT + a touch of RL) — both 14B models that pack a punch in a small size across reasoning and general purpose benchmarks🧵
2
12
89
Excited to share our latest Phi model, Phi4-reasoning, a small but powerful model that match the performance of much larger reasoning models up to DeepSeek R1. Here is the report for new insights into training reasoning models and evaluating them:
lnkd.in
This link will take you to a page that’s not on LinkedIn
Introducing Phi-4-reasoning, adding reasoning models to the Phi family of SLMs. The model is trained with both supervised finetuning (using a carefully curated dataset of reasoning demonstration) and Reinforcement Learning. 📌Competitive results on reasoning benchmarks with
7
18
66
Introducing Phi-4-reasoning, adding reasoning models to the Phi family of SLMs. The model is trained with both supervised finetuning (using a carefully curated dataset of reasoning demonstration) and Reinforcement Learning. 📌Competitive results on reasoning benchmarks with
4
34
141
In all, we SFT’ed on ~1.4M reasoning traces on select prompts and further RL'd on a small ~6k sample. Despite the relatively long SFT on select domains, we see broad generalization across domains and no degradation in general purpose performance. On the contrary....🔁📚
1
2
4
Joint work with: @Arindam1408,@sj_agrwl, Caio Mendes,@OlliSaarikivi,@marah_i_abdin,@suriyagnskr,@BehlHarkirat,@zzzzgq,@VaishShrivas,@DimitrisPapail5,@rosaguga,Piero Kauffmann,@sytelus,Yash Lara ,@vidhisha_b,@ChenLingjiao,Neel Joshi,@VibhavVineet, @besanushi,@AhmedHAwadallah
0
1
1
Nice summary of more cool results for Phi-4-Reasoning by @DimitrisPapail
We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.
0
3
13
Tech Report: https://t.co/FMshGcaS8Z HuggingFace: https://t.co/gQ92JKG61H and https://t.co/by3qIb4ij5 Azure AI Foundry:
1
0
2
Phi-4-reasoning-plus is obtained via a short reinforcement learning on Phi-4-reasoning using a randomly selected subset of SFT prompts. This short RL amplifies the reasoning style and unlocks nice improvements across benchmarks with longer response length.
1
0
2
Phi-4-reasoning is supervised fine-tuned on Phi-4. The secret sauce? 1) high-quality prompts at the edge of model capability to go beyond vanilla distillation + strong reasoning responses from a teacher. 2) optimal data mixture of different sources for best overall performance.
1
0
2
More interestingly, our models generalize well to out-of-distribution tasks like algorithmic problem solving, planning, and spatial reasoning. These skills were not targeted in our training data but Phi-4-reasoning performs quite well.
1
0
3
With 14B parameters, both models are competitive and often better than (larger) frontier models: outperforming DeepSeek-R1-Distill-Llama-70B across the board (small gap in coding) and comparable with original DeepSeek-R1 on AIME 2025 which came out after our data cutoff date.
1
0
3
Excited to release our first set of reasoning models Phi-4-reasoning and Phi-4-reasoning-plus, available today on HuggingFace and Azure AI foundry. Some interesting insights below and more deep dives in following days!
1
10
42
Excited to see our SLM work, Phi, mentioned in MIT Technology Review as top 10 breakthrough technologies! 😊 https://t.co/4hkKygdhYq
technologyreview.com
What will really matter in the long run? That’s the question we tackle each year as we compile this annual list.
0
0
2
Are you ready for an early Christmas present from our team at Microsoft Research? Introducing the most powerful smol model ever built in the world! Welcome to Phi-4! 👇
37
133
2K
Surprise #NeurIPS2024 drop for y'all: phi-4 available open weights and with amazing results!!! Tl;dr: phi-4 is in Llama 3.3-70B category (win some lose some) with 5x fewer parameters, and notably outperforms on pure reasoning like GPQA (56%) and MATH (80%).
19
68
413
🚀 Phi-4 is here! A small language model that performs as well as (and often better than) large models on certain types of complex reasoning tasks such as math. Useful for us in @MSFTResearch, and available now for all researcher on the Azure AI Foundry! https://t.co/83vpjSOHaT
42
178
734
phi-3 is here, and it's ... good :-). I made a quick short demo to give you a feel of what phi-3-mini (3.8B) can do. Stay tuned for the open weights release and more announcements tomorrow morning! (And ofc this wouldn't be complete without the usual table of benchmarks!)
40
175
922
I'm excited for our NeurIPS LLM Efficiency Competition workshop tomorrow: 1LLM + 1GPU + 1Day! Stop by 1:30 CT to see Weiwei Yang (MSR), @marksaroufim , @jeremyphoward , @rasbt , Ao Liu, @Tim_Dettmers , @sourab_m , @KemingLu612 , @mojan_jp , @LChoshen , Vicki Boykis, @cpuhrsch
0
6
29
Enjoy everyone! (And remember it's a base model so you might have to play around with your prompts; if you want it to follow instructions you can try the format "Instruct:... Ouput:") https://t.co/MCajRFDsK4
huggingface.co
26
182
1K