Haoran Xu @fe1ixxu X Profile

Haoran Xu

@fe1ixxu

Followers

539

Following

140

Media

9

Statuses

86

Senior Researcher @Microsoft | Ph.D. @JHU '24| ex-intern @Microsoft| @Meta AI | @Amazon Alexa AI

https://t.co/G7bp0zUSIS

Seattle, WA

Joined August 2017

Don't wanna be here? Send us removal request.

Haoran Xu

@fe1ixxu

1 year

Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB

4

13

49

Tianjian Li

@tli104

3 months

Language models often produce repetitive responses, and this issue is further amplified by post-training. In this work, we introduce DARLING, a method that explicitly optimizes for both response diversity and quality within online reinforcement learning!

Jason Weston

@jaseweston

3 months

🌀Diversity Aware RL (DARLING)🌀 📝: https://t.co/MH0tui34Cb - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

2

24

90

Liliang Ren

@liliang_ren

5 months

Reasoning can be made much, much faster—with fundamental changes in neural architecture. 😮 Introducing Phi4-mini-Flash-Reasoning: a 3.8B model that surpasses Phi4-mini-Reasoning on major reasoning tasks (AIME24/25, MATH500, GPQA-D), while delivering up-to 10× higher throughput

2

71

365

JHU Computer Science

@JHUCompSci

7 months

Fluent, fast, and fair—in collaboration with @MSFTResearch, Johns Hopkins computer scientists (including @fe1ixxu & @kentonmurray) have built a new machine translation model that achieves top-tier performance across 50 diverse languages. Learn more: https://t.co/4vBEEDWIgS

0

3

8

Satya Nadella

@satyanadella

7 months

Another big step forward for our SLM Phi family, with new reasoning models that once again redefine what is possible with small and efficient AI.

Microsoft Azure

@Azure

7 months

One year ago, Microsoft introduced small language models with the release of Phi-3 on Azure AI Foundry. Now, we're introducing Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning—pushing what's possible with AI. Read the blog:

63

103

834

Weizhu Chen

@WeizhuChen

7 months

Glad to see the team used a 3.8B model (Phi-4-mini-reasoning) to achieve 94.6 in Math-500 and 57.5 in AIME-24. arxiv: https://t.co/741JoHgK4m hf: https://t.co/PVbW4jyJTu Azure: https://t.co/V2QusWIAgc

2

4

28

Haoran Xu

@fe1ixxu

7 months

Model:

huggingface.co

0

1

Haoran Xu

@fe1ixxu

7 months

🚀 Phi-4-Mini-Reasoning is finally out! Two months ago, we introduced a reasoning-enhanced Phi-4-Mini. Since then, we've taken it further—a compact model with robust reasoning abilities that even surpass, models up to 2x its size. Paper: https://t.co/GcSxxwVZX4

3

6

32

HyoJung Han

@h__j___han

7 months

I'll be presenting our work, VocADT, tomorrow at #ICLR2025✨ Check out our poster session: https://t.co/bVOYMDQBnz 🗓️Thu 24 Apr 3 p.m. - 5:30 p.m 📍Hall 3 + Hall 2B #250 So excited to be attending @iclr_conf in Singapore🇸🇬

HyoJung Han

@h__j___han

1 year

🧐Which languages benefit the most from vocabulary adaptation? We introduce VocADT, a new vocabulary adaptation method using a vocabulary adapter, and explore the impact of various adaptation strategies on languages with diverse scripts and fragmentation to answer this question.

0

12

31

Young

@yjkim362

9 months

We also arxived #Phi-4-Mini technical report to cover our innovations for building strong lightweight multimodal model Phi-4-multimodal and language model Phi-4-mini. We use mixture-of-LoRAs technique to combine text, image, speech modalities together without interference.

AK

@_akhaliq

9 months

Phi-4-Mini Technical Report Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

1

8

55

Haoran Xu

@fe1ixxu

9 months

Excited to share that Phi-4-mini has been released! This was my first time rolling up my sleeves and experiencing the entire text training process. We also have a reasoning-enhanced Phi-4—outperforming many 7B reasoning models—which we plan to release very soon. Stay tuned!

Weizhu Chen

@WeizhuChen

9 months

We released Phi-4-mini (3.8B base in LLM), a new SLM excelling in language, vision, and audio through a mixture-of-LoRA, uniting three modalities in one model. I am so impressed with its new audio capability. I hope you can play with it and share with us your feedback. We also

0

16

Haoran Xu

@fe1ixxu

10 months

Excited to share that X-ALMA got accepted at #ICLR2025! See you in Singapore!

Haoran Xu

@fe1ixxu

1 year

Multilingual models are usually heavily skewed in favor of high-resource languages. We change this with X-ALMA: an LLM-based translator committed to ensuring top-tier performance across 50 diverse languages, regardless of their resource levels! Paper: https://t.co/O4M5LDGdAB

0

2

7

Haoran Xu

@fe1ixxu

1 year

Glad to see CPO is in a lecture now!

Wei Xu

@cocoweixu

1 year

We wrapped up CS 8803 "Large Language Model" class at @GeorgiaTech for Fall 2024. Here is the reading list: • learning from human preferences (PPO, DPO, SimPO, CPO, RRHF, ORPO, CTO) • real-world LLM (Llama-3, Aya, Arena's) • efficient LLM (MoMa, LoRA, QLoRA, LESS)

0

1

8

Haoran Xu

@fe1ixxu

1 year

Work done with my amazing co-workers: @kentonmurray, Philipp Koehn, @akikoe_, @MosesSMT (Hieu Hoang), and @HudaKhay !

0

1

Haoran Xu

@fe1ixxu

1 year

Github: https://t.co/V1PlDYqyBB Huggingface Models and Datasets 🤗:

huggingface.co

2

0

1

Haoran Xu

@fe1ixxu

1 year

3 years, I am officially Dr. Xu now!! Big thanks to my advisors: @kentonmurray and Philipp Koehn. I can't achieve this without you!

14

8

147

Young

@yjkim362

1 year

"What if Phi meets MoE?" I am super excited to share our new Phi-3.5-MoE. Phi-3.5-MoE is a 16 x 3.8B MoE model that only activates 6.6B params with 2 experts. MMLU score of 78.9! It outperforms Llama-3.1 8B, Gemma-2-9B, and Gemini-1.5-Flash. And, close to GPT-4o-mini. MIT lic

3

8

66

Haoran Xu

@fe1ixxu

1 year

Here’s some better news: Combining CPO and SimPO can likely improve the model! Check out more details in our GitHub code: https://t.co/9l63ukeSTe

Haoran Xu

@fe1ixxu

1 year

We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response

1

14

38

Haoran Xu

@fe1ixxu

1 year

We recently had multiple rounds of discussions with the SimPO authors regarding the lack of comparison to CPO in their main paper. We both agree that it was an unintentional oversight, and they will update the paper to address it. We appreciate their positive and prompt response

0

4

20

Lingfeng Shen

@Lingfeng_nlp

1 year

Super excited that our work got picked for an #Oral presentation at #ICML this year! Had an awesome time collaborating with @aamixsh and @DanielKhashabi at @jhuclsp. Pity I can't make it to Vienna because of visa issues😅

Lingfeng Shen

@Lingfeng_nlp

2 years

Is In-Context Learning (ICL) equivalent to Gradient Descent (GD)? There is a common belief that applying ICL in #LLM functions like GD-based fine-tuning. But does this hold in real-world LLMs? 🤔 Find out in our latest paper: https://t.co/SPgtCgDucT

0

4

30