Zechun Liu Profile
Zechun Liu

@zechunliu

Followers
503
Following
42
Media
10
Statuses
42

Research Scientist @Meta, SpinQuant, MobileLLM

Joined June 2023
Don't wanna be here? Send us removal request.
@zechunliu
Zechun Liu
2 months
Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,
@_akhaliq
AK
2 months
Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the
6
16
117
@zechunliu
Zechun Liu
1 month
🔥 MobileLLM-R1 paper and code are now available! 🚀 Can small models reason? We believe the answer is yes! 🌟 With fewer than 1B parameters and trained on just 4.2T tokens—only 1/10 of Qwen’s—MobileLLM-R1 challenges the belief that reasoning only appears in huge models trained
3
4
23
@zechunliu
Zechun Liu
1 month
🎉 Two NeurIPS papers accepted! 🌟 ParetoQ: a unified quantization framework across 1–4 bits. Our optimized training & quantization surpass prior SoTA by a large margin—even a 600M ternary model beats BitNet 3B ternary model using 1/5 the parameters. 🌟 RDD (robotics) finds
0
2
16
@zechunliu
Zechun Liu
2 months
Glad to see MobileLLM-R1 has attracted broad attention across the community! In fact, MobileLLM-R1 uses only ~2T token high-quality OSS data and is trained for a total of 4T tokens. Let’s see if MobileLLM can help push things toward the “less is more” direction! 🚀
@chrmanning
Christopher Manning
2 months
2025: The year that 4 trillion tokens became a small amount of training data! 🤯 (But great work on producing strong reasoning performance with trained-from-scratch tiny models!)
0
0
5
@zechunliu
Zechun Liu
2 months
@_akhaliq also built a great app for MobileLLM-R1: https://t.co/WOvyhLzMob. Anycoder is incredibly fast at building apps!
Tweet card summary image
huggingface.co
0
2
8
@fiandola
Forrest Iandola
4 months
Efficient Track Anything is accepted to ICCV 2025! See you in Hawaii!
@fiandola
Forrest Iandola
11 months
[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!
0
5
16
@PyTorch
PyTorch
5 months
Quantization of large language models aims to cut compute and memory needs while keeping performance. 𝐏𝐚𝐫𝐞𝐭𝐨𝐐 delivers SOTA results across bit-widths, showing 1.58-, 2-, and 3-bit quantization offer better size-accuracy trade-offs than 4-bit. 💡 Read more:
0
13
128
@zechunliu
Zechun Liu
5 months
🚀 We’re releasing ParetoQ, a family of quantized MobileLLMs — ultra-efficient, performance-retaining models for edge devices. 🧠 Smallest model: 1-bit, 125M → only 16MB on disk 📈 1.58-bit 600M even beats 1.58-bit 3B from BitNet(1-bit Era paper) 🔥 👉 Models:
0
1
14
@zechunliu
Zechun Liu
8 months
🚀 We're thrilled to announce that the SoTA low-bit quantization ParetoQ code is now open-source! 🌟 https://t.co/jyXuAFGAkA 🔍 What does this repo support? 🌟State-of-the-art sub-4-bit quantization: It is a significant upgrade from our previous LLM-QAT repo. Outperforming all
0
6
17
@BeidiChen
Beidi Chen
9 months
⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and
@InfiniAILab
Infini-AI-Lab
9 months
🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –
20
91
456
@zechunliu
Zechun Liu
9 months
Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.
@tydsh
Yuandong Tian
9 months
We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit
0
7
24
@tydsh
Yuandong Tian
9 months
We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit
Tweet card summary image
arxiv.org
The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose...
2
14
75
@fiandola
Forrest Iandola
11 months
[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!
13
112
520
@YoungXiong1
Yunyang Xiong
11 months
🚀Excited to share our Efficient Track Anything. It is small but mighty, >2x faster than SAM2 on A100 and runs > 10 FPS on iPhone 15 Pro Max. How’d we do it? EfficientSAM + Efficient Memory Attention! Paper: https://t.co/FN7NMuEO9R Project (demo): https://t.co/KSLPj5rM1v with:
4
37
113
@zechunliu
Zechun Liu
1 year
Thanks @ylecun for promoting our work. 🎉 MobileLLM models at sizes 125M 350M 600M are now available on HuggingFace! 🚀
Tweet card summary image
huggingface.co
@ylecun
Yann LeCun
1 year
MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: https://t.co/TDWQWdZeIy
0
1
10
@zechunliu
Zechun Liu
1 year
🚀We're thrilled to announce the MobileLLM weights are Available on HuggingFace: https://t.co/C5zcQPT6VO 📱MobileLLM is a state-of-the-art language model designed for mobile devices: https://t.co/zQIuwDBEYT 🔥Explore the pretraining code on GitHub: https://t.co/aIW4rQV2Cw
1
8
25
@YoungXiong1
Yunyang Xiong
1 year
🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: https://t.co/mmGiIYicB2 🧑🏻‍💻Code: https://t.co/mLR11itc27 🚀Project (Demo): https://t.co/nAOZo7eJi8 We propose LongVU, a video LLM with a spatiotemporal adaptive
5
73
253
@zechunliu
Zechun Liu
1 year
🎉I'm excited to share the news that SpinQuant supported the live demo in Meta Connect! We just made our 4-bit quantized LLaMA SpinQuant model publicly available. Check it out if you're interested: https://t.co/u7YsFDY1ap
1
1
17