Zechun Liu
@zechunliu
Followers
503
Following
42
Media
10
Statuses
42
Research Scientist @Meta, SpinQuant, MobileLLM
Joined June 2023
Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,
Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the
6
16
117
🔥 MobileLLM-R1 paper and code are now available! 🚀 Can small models reason? We believe the answer is yes! 🌟 With fewer than 1B parameters and trained on just 4.2T tokens—only 1/10 of Qwen’s—MobileLLM-R1 challenges the belief that reasoning only appears in huge models trained
3
4
23
🎉 Two NeurIPS papers accepted! 🌟 ParetoQ: a unified quantization framework across 1–4 bits. Our optimized training & quantization surpass prior SoTA by a large margin—even a 600M ternary model beats BitNet 3B ternary model using 1/5 the parameters. 🌟 RDD (robotics) finds
0
2
16
Glad to see MobileLLM-R1 has attracted broad attention across the community! In fact, MobileLLM-R1 uses only ~2T token high-quality OSS data and is trained for a total of 4T tokens. Let’s see if MobileLLM can help push things toward the “less is more” direction! 🚀
2025: The year that 4 trillion tokens became a small amount of training data! 🤯 (But great work on producing strong reasoning performance with trained-from-scratch tiny models!)
0
0
5
@_akhaliq also built a great app for MobileLLM-R1: https://t.co/WOvyhLzMob. Anycoder is incredibly fast at building apps!
huggingface.co
0
2
8
Quantization of large language models aims to cut compute and memory needs while keeping performance. 𝐏𝐚𝐫𝐞𝐭𝐨𝐐 delivers SOTA results across bit-widths, showing 1.58-, 2-, and 3-bit quantization offer better size-accuracy trade-offs than 4-bit. 💡 Read more:
0
13
128
🚀 We’re releasing ParetoQ, a family of quantized MobileLLMs — ultra-efficient, performance-retaining models for edge devices. 🧠 Smallest model: 1-bit, 125M → only 16MB on disk 📈 1.58-bit 600M even beats 1.58-bit 3B from BitNet(1-bit Era paper) 🔥 👉 Models:
0
1
14
🚀 We're thrilled to announce that the SoTA low-bit quantization ParetoQ code is now open-source! 🌟 https://t.co/jyXuAFGAkA 🔍 What does this repo support? 🌟State-of-the-art sub-4-bit quantization: It is a significant upgrade from our previous LLM-QAT repo. Outperforming all
0
6
17
⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and
🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –
20
91
456
Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.
We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit
0
7
24
We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit
arxiv.org
The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose...
2
14
75
🚀Excited to share our Efficient Track Anything. It is small but mighty, >2x faster than SAM2 on A100 and runs > 10 FPS on iPhone 15 Pro Max. How’d we do it? EfficientSAM + Efficient Memory Attention! Paper: https://t.co/FN7NMuEO9R Project (demo): https://t.co/KSLPj5rM1v with:
4
37
113
Thanks @ylecun for promoting our work. 🎉 MobileLLM models at sizes 125M 350M 600M are now available on HuggingFace! 🚀
huggingface.co
MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: https://t.co/TDWQWdZeIy
0
1
10
🚀We're thrilled to announce the MobileLLM weights are Available on HuggingFace: https://t.co/C5zcQPT6VO 📱MobileLLM is a state-of-the-art language model designed for mobile devices: https://t.co/zQIuwDBEYT 🔥Explore the pretraining code on GitHub: https://t.co/aIW4rQV2Cw
1
8
25
🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: https://t.co/mmGiIYicB2 🧑🏻💻Code: https://t.co/mLR11itc27 🚀Project (Demo): https://t.co/nAOZo7eJi8 We propose LongVU, a video LLM with a spatiotemporal adaptive
5
73
253
🎉I'm excited to share the news that SpinQuant supported the live demo in Meta Connect! We just made our 4-bit quantized LLaMA SpinQuant model publicly available. Check it out if you're interested: https://t.co/u7YsFDY1ap
1
1
17