Ziwei Liu
@liuziwei7
Followers
11K
Following
4K
Media
239
Statuses
2K
Associate Professor @ NTU - Vision, Learning and Graphics.
Singapore
Joined January 2018
๐ EASI: https://t.co/SE16ooTgLg ๐ EASI-Leaderboard: https://t.co/LXkuvj5eWW ๐ SenseSI: https://t.co/Laj3IeApaj (3/3)
huggingface.co
0
2
5
๐ Evaluating MLLMs on Spatial Intelligence is now made EASI! We introduce EASI, an easy-to-use framework and leaderboard for holistic evaluation of multimodal LLMs on spatial intelligence, a key yet underexplored capability. (1/3)
1
5
6
Discover IGGT: the Instance-Grounded Geometry Transformer unifying 3D reconstruction & instance understanding. It introduces InsScene-15K, a 200M image dataset for training. ๐ Paper: https://t.co/Ux93FTs3q8 ๐พ Dataset:
huggingface.co
0
1
8
StepFun and partners release IGGT on Hugging Face A unified transformer for semantic 3D reconstruction and instance-level understanding, enabling coherent 3D scene comprehension from 2D inputs.
2
7
36
The King of Monsters has awakened! Arbiter Studio x Godzilla LUNAR SPEED GLASS PAD AVAILABLE NOW!! @Godzilla_Toho
5
23
85
๐ ๐ฏ๐๐๐: ๐๐ฟ๐ผ๐๐ป๐ฑ ๐๐๐ฒ๐ฟ๐๐๐ต๐ถ๐ป๐ด ๐๐๐ฒ๐ฟ๐๐๐ต๐ฒ๐ฟ๐ฒ ๐ถ๐ป ๐ฏ๐ Excited to share that our new dataset has been accepted to #NeurIPS2025 DB Track! ๐ฏ๐๐๐ establishes the first ๐บ๐๐น๐๐ถ-๐ฝ๐น๐ฎ๐๐ณ๐ผ๐ฟ๐บ, ๐บ๐๐น๐๐ถ-๐บ๐ผ๐ฑ๐ฎ๐น ๐ฏ๐ ๐ด๐ฟ๐ผ๐๐ป๐ฑ๐ถ๐ป๐ด benchmark for
0
2
25
๐ ๐ฏ๐๐๐: ๐๐ฟ๐ผ๐๐ป๐ฑ ๐๐๐ฒ๐ฟ๐๐๐ต๐ถ๐ป๐ด ๐๐๐ฒ๐ฟ๐๐๐ต๐ฒ๐ฟ๐ฒ ๐ถ๐ป ๐ฏ๐ Excited to share that our new dataset has been accepted to the #NeurIPS2025 DB Track! ๐ฏ๐๐๐ establishes the first ๐บ๐๐น๐๐ถ-๐ฝ๐น๐ฎ๐๐ณ๐ผ๐ฟ๐บ, ๐บ๐๐น๐๐ถ-๐บ๐ผ๐ฑ๐ฎ๐น ๐ฏ๐ ๐ด๐ฟ๐ผ๐๐ป๐ฑ๐ถ๐ป๐ด
1
24
127
Thrilled to share our work: FALCON๐ฆ
: From Spatial to Actions ๐ Strong 3D understanding ๐ Flexible input: works with RGB-only, optionally fuses depth/pose for higher fidelity ๐ก๏ธ Robust to clutter, spatial prompts, and object scale/height variations https://t.co/mSSBNqwFc8
0
4
19
Developed by NWPU, NTU, StepFun, THU, & CUHK, IGGT offers instance-grounded scene understanding with a plug-and-play design. It introduces InsScene-15K, a 200M image dataset. Paper: https://t.co/Ux93FTsBfG Dataset:
huggingface.co
1
2
9
IGGT: a unified transformer for semantic 3D reconstruction IGGT is an end-to-end unified transformer that marries geometry with instance-level semantics. It achieves SOTA 3D reconstruction & understanding from 2D images, powering spatial tracking & open-vocabulary segmentation.
2
21
95
9. The Quest for Generalizable Motion Generation: Data, Model, and Evaluation ๐ Keywords: 3D human motion generation, generalization capability, ViGen, ViMoGen, MBench ๐ก Category: Generative Models ๐ Research Objective: - The research aims to enhance the generalization
1
2
2
From Spatial to Actions Grounding Vision-Language-Action Model in Spatial Foundation Priors
2
30
134
๐กInstance-Grounded Geometry Transformer๐ก #IGGT is an end-to-end geometry transformer that unifies spatial reconstruction and instance-level semantic understanding - Page: https://t.co/CUeVGRdKeZ - Paper @HuggingPapers: https://t.co/T7aGsyFeqE . - Code:
github.com
Contribute to lifuguan/IGGT_official development by creating an account on GitHub.
Thrilled to share our work, IGGT: Instance-Grounded Geometry Transformer! โจ ๐ง End-to-End Unified Model ๐ Large-Scale Dataset InsScene-15K ๐ Instance-Grounded Scene Understanding ๐ฏ Support Multi-Applications (tracking, segmentation, grounding) https://t.co/ob8FM0JrPM
1
20
140
9. IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction ๐ Keywords: InstanceGrounded Geometry Transformer, 3D reconstruction, 3D-Consistent Contrastive Learning, instance-level contextual understanding, InsScene-15K ๐ก Category: Computer Vision ๐
1
2
7
Thrilled to share our work, IGGT: Instance-Grounded Geometry Transformer! โจ ๐ง End-to-End Unified Model ๐ Large-Scale Dataset InsScene-15K ๐ Instance-Grounded Scene Understanding ๐ฏ Support Multi-Applications (tracking, segmentation, grounding) https://t.co/ob8FM0JrPM
0
27
164
๐ธOur team has three #ICCV talks thanks to all the amazing workshop organizers @ICCVConference We now share the talk slides covering - Native Multimodal Model: https://t.co/dcdWk6007m - Reasoning in Generation: https://t.co/Uy7FxUQvd1 - Ego Intelligence: https://t.co/Z0xnv9dcDe
1
23
137
Back in 2024, LMMs-Eval built a complete evaluation ecosystem for the MLLM/LMM community, with countless researchers contributing their models and benchmarks to raise the whole edifice. I was fortunate to be one of them: our series of video-LMM works (MovieChat, AuroraCap, VDC)
Throughout my journey in developing multimodal models, Iโve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly โ with full
2
3
29
๐ฅOne-Stop Training Engine for Unified Models๐ฅ โก๏ธLMMs-Engineโก๏ธ is a lean and flexible unified model training engine built for hacking at scale * Support multimodal inputs and outputs, from AR, diffusion and linear models, to unified models like BAGEL ๐ https://t.co/x2CW8XZlRu
Throughout my journey in developing multimodal models, Iโve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly โ with full
6
35
192
Join us for The AI Talks S5E2 (SGT Oct 28 10 AM / PDT Oct 27 7 PM / EDT Oct 27 10 PM) with @thwiedemer to explore Veo 3โs zero-shot reasoning, and join the conversation on how video generation begins to mirror LLMsโ emerging reasoning abilities. #AI #Reasoning #VideoGeneration
๐๏ธ The AI Talks | S5E2 โVideo Models Are Zero-Shot Learners and Reasonersโ by Thaddรคus Wiedemer (IMPRS-IS PhD, Google DeepMind). Can video models reason like LLMs? Join us to explore Veo 3โs zero-shot visual intelligence. ๐ Oct 28 10 AM SG / Oct 27 10 PM Toronto #TheAITalks
0
6
20
๐๏ธ The AI Talks | S5E2 โVideo Models Are Zero-Shot Learners and Reasonersโ by Thaddรคus Wiedemer (IMPRS-IS PhD, Google DeepMind). Can video models reason like LLMs? Join us to explore Veo 3โs zero-shot visual intelligence. ๐ Oct 28 10 AM SG / Oct 27 10 PM Toronto #TheAITalks
1
12
18
๐ Releasing LMMs Engine by EvolvingLMMsโLab โ a lean, flexible framework for any-to-any modality pretraining & fine-tuning. ๐ง Built with cutting-edge optimizations: FSDP2, Ulysses Sequence Parallel, Flash Attention 2 ๐ Dive in:
github.com
A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale. - EvolvingLMMs-Lab/lmms-engine
1
10
73