
Zhuang Liu
@liuzhuang1234
Followers
11K
Following
3K
Media
75
Statuses
457
Assistant Professor @PrincetonCS. researcher in deep learning, vision, models. previously @MetaAI, @UCBerkeley, @Tsinghua_Uni
Princeton, NJ
Joined April 2016
RT @MengdiWang10: 🚀 Introducing LabOS: The AI-XR Co-Scientist A system that sees, understands, and works with humans in real-world labs. 👁️…
0
1
0
How can an AI model learn the underlying dynamics of a visual scene? We're introducing Trajectory Fields, a new way to represent video in 4D! It models the path of each pixel as a continuous 3D trajectory, which is parameterized by a B-spline function of time. This unlocks
Excited to share our latest work from the ByteDance Seed Depth Anything team — Trace Anything: Representing Any Video in 4D via Trajectory Fields 💻 Project Page: https://t.co/Q390WcWwG4 📄 Paper: https://t.co/NfxT260QWy 📦 Code: https://t.co/r2VbOHyRwL 🤖 Model:
1
17
76
America needs you! Join U.S. Immigration and Customs Enforcement today.
2K
5K
23K
Our Goedel-Prover V1 will be presented at COLM 2025 in Montreal this Wednesday afternoon! I won’t be there in person, but my amazing and renowned colleague @danqi_chen will be around to help with the poster — feel free to stop by!
2
8
72
Excited to share that I’ve been promoted to Associate Professor with tenure at Princeton!🎉 6 years may not be long, but AI research has evolved significantly during this period. Grateful to all my students, collaborators, colleagues for being with me on this remarkable journey!
150
63
3K
See our work on adapting VLMs to pixel-level depth estimation task! Congrats to the team
Thrilled to release DepthLM! We show 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘁𝗶𝗺𝗲 that VLMs—without task-specific architecture or loss—can have comparable accuracy to pure vision models for metric depth estimation. [Code]: https://t.co/EFjvzm1yWN
1
5
121
I’ll be giving a guest lecture at Princeton tomorrow (Thursday, 9/25), sharing our recent works on LLM Reasoning and Efficiency. Time and location below: 2:55–4:15pm at CS Building 402 Thanks to @liuzhuang1234 for organizing this!
2
1
28
How do we navigate a growing collection of post-trained LLMs? In Delta Activations: A Representation for Finetuned LLMs, we propose a compact embedding that encodes the post-training signal. Try the interactive model navigator 👉 https://t.co/I7mKccXfzr
3
19
47
Large-scale 3D Scene Generation (all scenes are real-time rendered)!! Physically-grounded generative data without hallucinations is the missing link for robot learning and testing at scale. We introduce a method that directly generates large-scale 3D driving scenes with
14
65
373
🚀 ~4 months ago, we introduced OpenVision — a fully open, cost-effective family of vision encoders that rival OpenAI’s CLIP and Google’s SigLIP. Today, we’re back with a major update: OpenVision 2 🎉 A thread 🧵 (1/n)
Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧 We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and
4
84
438
Robotic manipulation sim2Real made easy with accurate geometry perception. While it seems intuitive to use geometric information for robotic manipulation—since geometry provides a generalizable representation—this approach hasn’t been widely adopted. Through this project, we
🚀 Want to build a 3D-aware manipulation policy, but troubled by the noisy depth perception? Want to train your manipulation policy in simulation, but tired of bridging the sim2real gap by degenerating geometric perception, like adding noise? Now these notorious problems are gone
2
8
51
Kinda cool that FAIR still releases ConvNext versions of their models
0
1
5
@jbhuang0604 For VLMs, this is a good survey paper to give a rundown on different types or architectures : https://t.co/DKTNpPdXg9. Also, highly recommend some background reading such as prefix tuning https://t.co/byUEQAgzCV, early foundation VLMs ( https://t.co/F8POEtQd3c, BLIP, etc). This is
arxiv.org
This work uniquely identifies and characterizes four prevalent multimodal model architectural patterns in the contemporary multimodal landscape. Systematically categorizing models by architecture...
1
8
46
After a year of severe injury and complex surgery, I wrote about my journey—physical, emotional, and everything in between—hoping it might help others feel less alone in their own recovery. https://t.co/s8LSW8chGz
tender-aster-768.notion.site
While the world was marveling at the birth of new intelligence, I was learning the primitive language of my own body’s collapse. The past year, celebrated by my peers as a triumphant one for artifi...
16
2
100
The report of Goedel-Prover-V2 is on arXiv now https://t.co/yROjbJMVgP . Check out the details on self-correction, large scale scaffolded data sythesis framework, and the magical model averaging.
9
118
323
Scaling CLIP on English-only data is outdated now… 🌍We built CLIP data curation pipeline for 300+ languages 🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves! 🥳It’s time to drop the language filter! 📝 https://t.co/pQuwzH053M [1/5] 🧵
3
89
310
Vision-language representations trained from scratch for 300+ languages. Optimal performance in both EN and non-EN benchmarks without compromising. Definitely new guideline for training universal representation models.
🌿Introducing MetaCLIP 2 🌿 📝: https://t.co/RyytqxRAw3 code, model: https://t.co/P0POS9E2EC After four years of advancements in English-centric CLIP development, MetaCLIP 2 is now taking the next step: scaling CLIP to worldwide data. The effort addresses long-standing
1
8
53
From GPT to MoE: I reviewed & compared the main LLMs of 2025 in terms of their architectural design from DeepSeek-V3 to Kimi 2. Multi-head Latent Attention, sliding window attention, new Post- & Pre-Norm placements, NoPE, shared-expert MoEs, and more... https://t.co/oEt8XzNxik
magazine.sebastianraschka.com
From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
43
480
2K
Thrilled to be part of such an incredible and talented team! It has been one month since I joined, and I’m inspired every day by our shared mission and commitment. Excited for what’s ahead!
Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're
2
2
31
Congrats to @parastooabtahi, @tri_dao and Alex Lombardi on being named 2025 Google Research Scholars. 🎉 The @googleresearch scholars program funds world-class research conducted by early-career professors. https://t.co/KUZ0Qo2EpO
0
6
79