Haoran Xu
@fe1ixxu
Followers
539
Following
140
Media
9
Statuses
86
Senior Researcher @Microsoft | Ph.D. @JHU '24| ex-intern @Microsoft| @Meta AI | @Amazon Alexa AI
Seattle, WA
Joined August 2017
Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB
4
13
49
Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!
🌀Diversity Aware RL (DARLING)🌀 📝: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5
2
24
90
Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput
2
71
365
Fluent, fast, and fair—in collaboration with @MSFTResearch, Johns Hopkins computer scientists (including @fe1ixxu & @kentonmurray) have built a new machine translation model that achieves top-tier performance across 50 diverse languages. Learn more: https://t.co/4vBEEDWIgS
0
3
8
Another big step forward for our SLM Phi family, with new reasoning models that once again redefine what is possible with small and efficient AI.
One year ago, Microsoft introduced small language models with the release of Phi-3 on Azure AI Foundry. Now, we're introducing Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—pushing what's possible with AI. Read the blog:
63
103
834
Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc
2
4
28
🚀 Phi-4-Mini-Reasoning is finally out! Two months ago, we introduced a reasoning-enhanced Phi-4-Mini. Since then, we've taken it further—a compact model with robust reasoning abilities that even surpass, models up to 2x its size. Paper: https://t.co/GcSxxwVZX4
3
6
32
I'll be presenting our work, VocADT, tomorrow at #ICLR2025✨ Check out our poster session: https://t.co/bVOYMDQBnz 🗓️Thu 24 Apr 3 p.m. - 5:30 p.m 📍Hall 3 + Hall 2B #250 So excited to be attending @iclr_conf in Singapore🇸🇬
🧐Which languages benefit the most from vocabulary adaptation? We introduce VocADT, a new vocabulary adaptation method using a vocabulary adapter, and explore the impact of various adaptation strategies on languages with diverse scripts and fragmentation to answer this question.
0
12
31
We also arxived #Phi-4-Mini technical report to cover our innovations for building strong lightweight multimodal model Phi-4-multimodal and language model Phi-4-mini. We use mixture-of-LoRAs technique to combine text, image, speech modalities together without interference.
1
8
55
Excited to share that Phi-4-mini has been released! This was my first time rolling up my sleeves and experiencing the entire text training process. We also have a reasoning-enhanced Phi-4—outperforming many 7B reasoning models—which we plan to release very soon. Stay tuned!
We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also
0
0
16
Excited to share that X-ALMA got accepted at #ICLR2025! See you in Singapore!
Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB
0
2
7
Glad to see CPO is in a lecture now!
We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024. Here is the reading list: • learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO) • real-world LLM (Llama-3, Aya, Arena's) • efficient LLM (MoMa, LoRA, QLoRA, LESS)
0
1
8
Work done with my amazing co-workers: @kentonmurray, Philipp Koehn, @akikoe_, @MosesSMT (Hieu Hoang), and @HudaKhay !
0
0
1
3 years, I am officially Dr. Xu now!! Big thanks to my advisors: @kentonmurray and Philipp Koehn. I can't achieve this without you!
14
8
147
"What if Phi meets MoE?" I am super excited to share our new Phi-3.5-MoE. Phi-3.5-MoE is a 16 x 3.8B MoE model that only activates 6.6B params with 2 experts. MMLU score of 78.9! It outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash. And, close to GPT-4o-mini. MIT lic
3
8
66
Here’s some better news: Combining CPO and SimPO can likely improve the model! Check out more details in our GitHub code: https://t.co/9l63ukeSTe
We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response
1
14
38
We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response
0
4
20
Super excited that our work got picked for an #Oral presentation at #ICML this year! Had an awesome time collaborating with @aamixsh and @DanielKhashabi at @jhuclsp. Pity I can't make it to Vienna because of visa issues😅
Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in #LLM functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔 Find out in our latest paper: https://t.co/SPgtCgDucT
0
4
30