fe1ixxu Profile Banner
Haoran Xu Profile
Haoran Xu

@fe1ixxu

Followers
539
Following
140
Media
9
Statuses
86

Senior Researcher @Microsoft | Ph.D. @JHU '24| ex-intern @Microsoft| @Meta AI | @Amazon Alexa AI

Seattle, WA
Joined August 2017
Don't wanna be here? Send us removal request.
@fe1ixxu
Haoran Xu
1 year
Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB
4
13
49
@tli104
Tianjian Li
3 months
Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!
@jaseweston
Jason Weston
3 months
🌀Diversity Aware RL (DARLING)🌀 📝: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5
2
24
90
@liliang_ren
Liliang Ren
5 months
Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput
2
71
365
@JHUCompSci
JHU Computer Science
7 months
Fluent, fast, and fair—in collaboration with @MSFTResearch, Johns Hopkins computer scientists (including @fe1ixxu & @kentonmurray) have built a new machine translation model that achieves top-tier performance across 50 diverse languages. Learn more: https://t.co/4vBEEDWIgS
0
3
8
@satyanadella
Satya Nadella
7 months
Another big step forward for our SLM Phi family, with new reasoning models that once again redefine what is possible with small and efficient AI.
@Azure
Microsoft Azure
7 months
One year ago, Microsoft introduced small language models with the release of Phi-3 on Azure AI Foundry. Now, we're introducing Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—pushing what's possible with AI. Read the blog:
63
103
834
@WeizhuChen
Weizhu Chen
7 months
Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc
2
4
28
@fe1ixxu
Haoran Xu
7 months
Model:
Tweet card summary image
huggingface.co
0
0
1
@fe1ixxu
Haoran Xu
7 months
🚀 Phi-4-Mini-Reasoning is finally out! Two months ago, we introduced a reasoning-enhanced Phi-4-Mini. Since then, we've taken it further—a compact model with robust reasoning abilities that even surpass, models up to 2x its size. Paper: https://t.co/GcSxxwVZX4
3
6
32
@h__j___han
HyoJung Han
7 months
I'll be presenting our work, VocADT, tomorrow at #ICLR2025✨ Check out our poster session: https://t.co/bVOYMDQBnz 🗓️Thu 24 Apr 3 p.m. - 5:30 p.m 📍Hall 3 + Hall 2B #250 So excited to be attending @iclr_conf in Singapore🇸🇬
@h__j___han
HyoJung Han
1 year
🧐Which languages benefit the most from vocabulary adaptation? We introduce VocADT, a new vocabulary adaptation method using a vocabulary adapter, and explore the impact of various adaptation strategies on languages with diverse scripts and fragmentation to answer this question.
0
12
31
@yjkim362
Young
9 months
We also arxived #Phi-4-Mini technical report to cover our innovations for building strong lightweight multimodal model Phi-4-multimodal and language model Phi-4-mini. We use mixture-of-LoRAs technique to combine text, image, speech modalities together without interference.
@_akhaliq
AK
9 months
Phi-4-Mini Technical Report Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
1
8
55
@fe1ixxu
Haoran Xu
9 months
Excited to share that Phi-4-mini has been released! This was my first time rolling up my sleeves and experiencing the entire text training process. We also have a reasoning-enhanced Phi-4—outperforming many 7B reasoning models—which we plan to release very soon. Stay tuned!
@WeizhuChen
Weizhu Chen
9 months
We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also
0
0
16
@fe1ixxu
Haoran Xu
10 months
Excited to share that X-ALMA got accepted at #ICLR2025! See you in Singapore!
@fe1ixxu
Haoran Xu
1 year
Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB
0
2
7
@fe1ixxu
Haoran Xu
1 year
Glad to see CPO is in a lecture now!
@cocoweixu
Wei Xu
1 year
We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024. Here is the reading list: • learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO) • real-world LLM (Llama-3, Aya, Arena's) • efficient LLM (MoMa, LoRA, QLoRA, LESS)
0
1
8
@fe1ixxu
Haoran Xu
1 year
Work done with my amazing co-workers: @kentonmurray, Philipp Koehn, @akikoe_, @MosesSMT (Hieu Hoang), and @HudaKhay !
0
0
1
@fe1ixxu
Haoran Xu
1 year
Github: https://t.co/V1PlDYqyBB Huggingface Models and Datasets 🤗:
huggingface.co
2
0
1
@fe1ixxu
Haoran Xu
1 year
3 years, I am officially Dr. Xu now!! Big thanks to my advisors: @kentonmurray and Philipp Koehn. I can't achieve this without you!
14
8
147
@yjkim362
Young
1 year
"What if Phi meets MoE?" I am super excited to share our new Phi-3.5-MoE. Phi-3.5-MoE is a 16 x 3.8B MoE model that only activates 6.6B params with 2 experts. MMLU score of 78.9! It outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash. And, close to GPT-4o-mini. MIT lic
3
8
66
@fe1ixxu
Haoran Xu
1 year
Here’s some better news: Combining CPO and SimPO can likely improve the model! Check out more details in our GitHub code: https://t.co/9l63ukeSTe
@fe1ixxu
Haoran Xu
1 year
We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response
1
14
38
@fe1ixxu
Haoran Xu
1 year
We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response
0
4
20
@Lingfeng_nlp
Lingfeng Shen
1 year
Super excited that our work got picked for an #Oral presentation at #ICML this year! Had an awesome time collaborating with @aamixsh and @DanielKhashabi at @jhuclsp. Pity I can't make it to Vienna because of visa issues😅
@Lingfeng_nlp
Lingfeng Shen
2 years
Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in #LLM functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔 Find out in our latest paper: https://t.co/SPgtCgDucT
0
4
30