Zhiqiu Lin Profile
Zhiqiu Lin

@ZhiqiuLin

Followers
520
Following
156
Media
22
Statuses
66

PhD Student at Carnegie Mellon University | Computer Vision and Language | Generative AI

Pittsburgh
Joined July 2017
Don't wanna be here? Send us removal request.
@ZhiqiuLin
Zhiqiu Lin
3 months
📷 Can AI understand camera motion like a cinematographer?. Meet CameraBench: a large-scale, expert-annotated dataset for understanding camera motion geometry (e.g., trajectories) and semantics (e.g., scene contexts) in any video – films, games, drone shots, vlogs, etc. Links
10
35
187
@ZhiqiuLin
Zhiqiu Lin
7 days
RT @tarashakhurana: Excited to share recent work with @kaihuac5 and @RamananDeva where we learn to do novel view synthesis for dynamic scen….
0
27
0
@ZhiqiuLin
Zhiqiu Lin
28 days
RT @roeiherzig: 🚀 Excited to share that our latest work on Sparse Attention Vectors (SAVs) has been accepted to @ICCVConference — see you a….
0
5
0
@ZhiqiuLin
Zhiqiu Lin
28 days
RT @chancharikm: Congrats to the rest of the team on this accomplishment!.
0
1
0
@ZhiqiuLin
Zhiqiu Lin
1 month
RT @Haoyu_Xiong_: Your bimanual manipulators might need a Robot Neck 🤖🦒. Introducing Vision in Action: Learning Active Perception from Huma….
0
86
0
@ZhiqiuLin
Zhiqiu Lin
1 month
RT @qsh_zh: 🚀 Introducing Cosmos-Predict2!. Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly im….
0
61
0
@ZhiqiuLin
Zhiqiu Lin
1 month
RT @JacobYeung: 1/6 🚀 Excited to share that BrainNRDS has been accepted as an oral at #CVPR2025!. We decode motion from fMRI activity and u….
0
13
0
@ZhiqiuLin
Zhiqiu Lin
2 months
RT @jasonyzhang2: Delighted to share what our team has been working on at Google!. After working for so long on sparse-view 3D, it's exciti….
0
33
0
@ZhiqiuLin
Zhiqiu Lin
3 months
CameraBench can also evaluate video QA tasks involving complex camera motion and visual reasoning, making it a practical video-language benchmark.
0
0
5
@ZhiqiuLin
Zhiqiu Lin
3 months
Our fine-tuned generative VLM outperforms GPT-4o and Gemini-2.5-Pro in describing camera movements — producing more accurate motion captions for real-world videos.
1
0
7
@ZhiqiuLin
Zhiqiu Lin
3 months
Therefore, we fine-tune a generative VLM (Qwen2.5-VL) on ~1,400 expert-annotated videos, boosting performance 1–2× and matching SOTA MegaSAM in classifying geometric motion primitives (e.g., camera moving forward/backward).
Tweet media one
1
0
3
@ZhiqiuLin
Zhiqiu Lin
3 months
Using CameraBench, we benchmark 20+ leading SfMs (MegaSAM, CUT3R, COLMAP) and VLMs (Gemini-2.5, GPT-4o, Qwen-2.5) — none yet achieve human-level perception of camera motion in dynamic videos. Even top SfMs like MegaSAM still struggle with dynamic and low-parallax scenes ⚠️
1
0
4
@ZhiqiuLin
Zhiqiu Lin
3 months
We collect CameraBench, a large-scale dataset with 150K+ motion labels 🏷️ and captions 📝 over ~3,000 videos 🎞️, spanning diverse types, genres, POVs, capturing devices 📷, and post-production effects 🎨.
1
0
5
@ZhiqiuLin
Zhiqiu Lin
3 months
To annotate complex motion in real-world videos, we spent months refining our taxonomy with film experts and designed a robust label-then-caption framework. To scale high-quality annotations, we ran a human study (100+ people) to quantify human annotation performance, finding
Tweet media one
1
0
3
@ZhiqiuLin
Zhiqiu Lin
3 months
“We must perceive to move, but we must also move to perceive.” – J.J. Gibson. Humans perceive the world through movement, e.g., depth from motion parallax). Likewise, camera motion is key to understanding a video’s perspective and the intent of the “invisible” camera operator.
1
0
6
@ZhiqiuLin
Zhiqiu Lin
3 months
RT @anishmadan23: 🚨 The 2nd iteration of our @CVPR Foundational Few-Shot Object Detection Challenge is LIVE!. Can your model think like an….
0
16
0
@ZhiqiuLin
Zhiqiu Lin
3 months
RT @i_ikhatri: Just over a month left to submit to this year's Argoverse 2 challenges! Returning from previous years, are our motion foreca….
0
9
0
@ZhiqiuLin
Zhiqiu Lin
3 months
Leaderboard update: GPT‑4o (40%) < Gemini‑2.5 (44%) < GPT‑o3 (55%). Hope to see future models push the limits on #NaturalBench!
Tweet media one
6
5
70
@ZhiqiuLin
Zhiqiu Lin
3 months
Fresh GPT‑o3 results on our vision‑centric #NaturalBench (NeurIPS’24) benchmark! 🎯 Its new visual chain‑of‑thought—by “zooming in” on details—cracks questions that still stump GPT‑4o. Yet vision reasoning isn’t solved: o3 can still hallucinate even after a full minute of
Tweet media one
Tweet media two
@ZhiqiuLin
Zhiqiu Lin
9 months
🚀 Make Vision Matter in Visual-Question-Answering (VQA)!. Introducing NaturalBench, a vision-centric VQA benchmark (NeurIPS'24) that challenges vision-language models with pairs of simple questions about natural imagery. 🌍📸. Here’s what we found after testing 53 models
Tweet media one
3
23
109
@ZhiqiuLin
Zhiqiu Lin
6 months
RT @chancharikm: 🎯 Introducing Sparse Attention Vectors (SAVs): A breakthrough method for extracting powerful multimodal features from Larg….
0
39
0