Zechun Liu @zechunliu X Profile

Zechun Liu

@zechunliu

Followers

503

Following

42

Media

10

Statuses

42

Research Scientist @Meta, SpinQuant, MobileLLM

Joined June 2023

Don't wanna be here? Send us removal request.

Zechun Liu

@zechunliu

2 months

Thanks @_akhaliq for sharing our work! MobileLLM-R1 marks a paradigm shift. Conventional wisdom suggests that reasoning only emerges after training on massive amounts of data, but we prove otherwise. With just 4.2T pre-training tokens and a small amount of post-training,

AK

@_akhaliq

2 months

Meta just dropped MobileLLM-R1 on Hugging Face a edge reasoning model with fewer than 1B parameters 2×–5× Performance Boost over other fully open-source models: MobileLLM-R1 achieves ~5× higher MATH accuracy vs. Olmo-1.24B, and ~2× vs. SmolLM2-1.7B. Uses just 1/10 the

6

16

117

Zechun Liu

@zechunliu

1 month

🔥 MobileLLM-R1 paper and code are now available! 🚀 Can small models reason? We believe the answer is yes! 🌟 With fewer than 1B parameters and trained on just 4.2T tokens—only 1/10 of Qwen’s—MobileLLM-R1 challenges the belief that reasoning only appears in huge models trained

3

4

23

Zechun Liu

@zechunliu

1 month

🎉 Two NeurIPS papers accepted! 🌟 ParetoQ: a unified quantization framework across 1–4 bits. Our optimized training & quantization surpass prior SoTA by a large margin—even a 600M ternary model beats BitNet 3B ternary model using 1/5 the parameters. 🌟 RDD (robotics) finds

0

2

16

Zechun Liu

@zechunliu

2 months

Glad to see MobileLLM-R1 has attracted broad attention across the community! In fact, MobileLLM-R1 uses only ~2T token high-quality OSS data and is trained for a total of 4T tokens. Let’s see if MobileLLM can help push things toward the “less is more” direction! 🚀

Christopher Manning

@chrmanning

2 months

2025: The year that 4 trillion tokens became a small amount of training data! 🤯 (But great work on producing strong reasoning performance with trained-from-scratch tiny models!)

0

5

Zechun Liu

@zechunliu

2 months

@_akhaliq also built a great app for MobileLLM-R1: https://t.co/WOvyhLzMob. Anycoder is incredibly fast at building apps!

huggingface.co

0

2

8

Forrest Iandola

@fiandola

4 months

Efficient Track Anything is accepted to ICCV 2025! See you in Hawaii!

Forrest Iandola

@fiandola

11 months

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!

0

5

16

PyTorch

@PyTorch

5 months

Quantization of large language models aims to cut compute and memory needs while keeping performance. 𝐏𝐚𝐫𝐞𝐭𝐨𝐐 delivers SOTA results across bit-widths, showing 1.58-, 2-, and 3-bit quantization offer better size-accuracy trade-offs than 4-bit. 💡 Read more:

0

13

128

Zechun Liu

@zechunliu

5 months

🚀 We’re releasing ParetoQ, a family of quantized MobileLLMs — ultra-efficient, performance-retaining models for edge devices. 🧠 Smallest model: 1-bit, 125M → only 16MB on disk 📈 1.58-bit 600M even beats 1.58-bit 3B from BitNet(1-bit Era paper) 🔥 👉 Models:

0

1

14

Zechun Liu

@zechunliu

8 months

🚀 We're thrilled to announce that the SoTA low-bit quantization ParetoQ code is now open-source! 🌟 https://t.co/jyXuAFGAkA 🔍 What does this repo support? 🌟State-of-the-art sub-4-bit quantization: It is a significant upgrade from our previous LLM-QAT repo. Outperforming all

0

6

17

Beidi Chen

@BeidiChen

9 months

⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and

Infini-AI-Lab

@InfiniAILab

9 months

🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –

20

91

456

Zechun Liu

@zechunliu

9 months

Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.

Yuandong Tian

@tydsh

9 months

We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit

0

7

24

Yuandong Tian

@tydsh

9 months

We introduce ParetoQ, a series of pre-trained models that show SoTA in trinary (1.58bit), 2/3/4-bit quantization for SLMs (up to 3B parameters) using initial full pre-training + QAT later. In addition, we also discover that the representation changes substantially after low-bit

arxiv.org

The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose...

2

14

75

Forrest Iandola

@fiandola

11 months

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from @Meta: interactive video segmentation and tracking on an iPhone!

13

112

520

Yunyang Xiong

@YoungXiong1

11 months

🚀Excited to share our Efficient Track Anything. It is small but mighty, >2x faster than SAM2 on A100 and runs > 10 FPS on iPhone 15 Pro Max. How’d we do it? EfficientSAM + Efficient Memory Attention! Paper: https://t.co/FN7NMuEO9R Project (demo): https://t.co/KSLPj5rM1v with:

4

37

113

Zechun Liu

@zechunliu

1 year

Thanks @ylecun for promoting our work. 🎉 MobileLLM models at sizes 125M 350M 600M are now available on HuggingFace! 🚀

huggingface.co

Yann LeCun

@ylecun

1 year

MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: https://t.co/TDWQWdZeIy

0

1

10

Zechun Liu

@zechunliu

1 year

🚀We're thrilled to announce the MobileLLM weights are Available on HuggingFace: https://t.co/C5zcQPT6VO 📱MobileLLM is a state-of-the-art language model designed for mobile devices： https://t.co/zQIuwDBEYT 🔥Explore the pretraining code on GitHub: https://t.co/aIW4rQV2Cw

1

8

25

Yunyang Xiong

@YoungXiong1

1 year

🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: https://t.co/mmGiIYicB2 🧑🏻‍💻Code: https://t.co/mLR11itc27 🚀Project (Demo): https://t.co/nAOZo7eJi8 We propose LongVU, a video LLM with a spatiotemporal adaptive

5

73

253

Zechun Liu

@zechunliu

1 year

🎉I'm excited to share the news that SpinQuant supported the live demo in Meta Connect! We just made our 4-bit quantized LLaMA SpinQuant model publicly available. Check it out if you're interested: https://t.co/u7YsFDY1ap

1

17