Ziwei Liu @liuziwei7 X Profile

Ziwei Liu

@liuziwei7

Followers

11K

Following

4K

Media

239

Statuses

2K

Associate Professor @ NTU - Vision, Learning and Graphics.

https://t.co/h9Jl41Mcef

Singapore

Joined January 2018

Don't wanna be here? Send us removal request.

Zhongang Cai

@caizhongang

14 hours

🔗 EASI: https://t.co/SE16ooTgLg 🔗 EASI-Leaderboard: https://t.co/LXkuvj5eWW 🔗 SenseSI: https://t.co/Laj3IeApaj (3/3)

huggingface.co

0

2

5

Zhongang Cai

@caizhongang

14 hours

🚀 Evaluating MLLMs on Spatial Intelligence is now made EASI! We introduce EASI, an easy-to-use framework and leaderboard for holistic evaluation of multimodal LLMs on spatial intelligence, a key yet underexplored capability. (1/3)

1

5

6

DailyPapers

@HuggingPapers

6 days

Discover IGGT: the Instance-Grounded Geometry Transformer unifying 3D reconstruction & instance understanding. It introduces InsScene-15K, a 200M image dataset for training. 📄 Paper: https://t.co/Ux93FTs3q8 💾 Dataset:

huggingface.co

0

1

8

DailyPapers

@HuggingPapers

6 days

StepFun and partners release IGGT on Hugging Face A unified transformer for semantic 3D reconstruction and instance-level understanding, enabling coherent 3D scene comprehension from 2D inputs.

2

7

36

Arbiter Studio

@arbiterstudio

13 hours

The King of Monsters has awakened! Arbiter Studio x Godzilla LUNAR SPEED GLASS PAD AVAILABLE NOW!! @Godzilla_Toho

5

23

85

Junwei Liang 梁俊卫

@JunweiLiangCMU

4 days

🚁 𝟯𝗘𝗘𝗗: 𝗚𝗿𝗼𝘂𝗻𝗱 𝗘𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝗘𝘃𝗲𝗿𝘆𝘄𝗵𝗲𝗿𝗲 𝗶𝗻 𝟯𝗗 Excited to share that our new dataset has been accepted to #NeurIPS2025 DB Track! 𝟯𝗘𝗘𝗗 establishes the first 𝗺𝘂𝗹𝘁𝗶-𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺, 𝗺𝘂𝗹𝘁𝗶-𝗺𝗼𝗱𝗮𝗹 𝟯𝗗 𝗴𝗿𝗼𝘂𝗻𝗱𝗶𝗻𝗴 benchmark for

0

2

25

Lingdong Kong

@ldkong1205

4 days

🚁 𝟯𝗘𝗘𝗗: 𝗚𝗿𝗼𝘂𝗻𝗱 𝗘𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴 𝗘𝘃𝗲𝗿𝘆𝘄𝗵𝗲𝗿𝗲 𝗶𝗻 𝟯𝗗 Excited to share that our new dataset has been accepted to the #NeurIPS2025 DB Track! 𝟯𝗘𝗘𝗗 establishes the first 𝗺𝘂𝗹𝘁𝗶-𝗽𝗹𝗮𝘁𝗳𝗼𝗿𝗺, 𝗺𝘂𝗹𝘁𝗶-𝗺𝗼𝗱𝗮𝗹 𝟯𝗗 𝗴𝗿𝗼𝘂𝗻𝗱𝗶𝗻𝗴

1

24

127

Travis_Zhang

@Travis_Laflamee

5 days

Thrilled to share our work: FALCON🦅: From Spatial to Actions 🌐 Strong 3D understanding 🔄 Flexible input: works with RGB-only, optionally fuses depth/pose for higher fidelity 🛡️ Robust to clutter, spatial prompts, and object scale/height variations https://t.co/mSSBNqwFc8

AK

@_akhaliq

10 days

From Spatial to Actions Grounding Vision-Language-Action Model in Spatial Foundation Priors

0

4

19

DailyPapers

@HuggingPapers

6 days

Developed by NWPU, NTU, StepFun, THU, & CUHK, IGGT offers instance-grounded scene understanding with a plug-and-play design. It introduces InsScene-15K, a 200M image dataset. Paper: https://t.co/Ux93FTsBfG Dataset:

huggingface.co

1

2

9

DailyPapers

@HuggingPapers

6 days

IGGT: a unified transformer for semantic 3D reconstruction IGGT is an end-to-end unified transformer that marries geometry with instance-level semantics. It achieves SOTA 3D reconstruction & understanding from 2D images, powering spatial tracking & open-vocabulary segmentation.

2

21

95

AI Native Foundation

@AINativeF

7 days

9. The Quest for Generalizable Motion Generation: Data, Model, and Evaluation 🔑 Keywords: 3D human motion generation, generalization capability, ViGen, ViMoGen, MBench 💡 Category: Generative Models 🌟 Research Objective: - The research aims to enhance the generalization

1

2

AK

@_akhaliq

10 days

From Spatial to Actions Grounding Vision-Language-Action Model in Spatial Foundation Priors

2

30

134

Ziwei Liu

@liuziwei7

10 days

🏡Instance-Grounded Geometry Transformer🏡 #IGGT is an end-to-end geometry transformer that unifies spatial reconstruction and instance-level semantic understanding - Page: https://t.co/CUeVGRdKeZ - Paper @HuggingPapers: https://t.co/T7aGsyFeqE . - Code:

github.com

Contribute to lifuguan/IGGT_official development by creating an account on GitHub.

lifuguan

@10027lifuguan

11 days

Thrilled to share our work, IGGT: Instance-Grounded Geometry Transformer! ✨ 🔧 End-to-End Unified Model 📊 Large-Scale Dataset InsScene-15K 🔌 Instance-Grounded Scene Understanding 🎯 Support Multi-Applications (tracking, segmentation, grounding) https://t.co/ob8FM0JrPM

1

20

140

AI Native Foundation

@AINativeF

10 days

9. IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction 🔑 Keywords: InstanceGrounded Geometry Transformer, 3D reconstruction, 3D-Consistent Contrastive Learning, instance-level contextual understanding, InsScene-15K 💡 Category: Computer Vision 🌟

1

2

7

lifuguan

@10027lifuguan

11 days

Thrilled to share our work, IGGT: Instance-Grounded Geometry Transformer! ✨ 🔧 End-to-End Unified Model 📊 Large-Scale Dataset InsScene-15K 🔌 Instance-Grounded Scene Understanding 🎯 Support Multi-Applications (tracking, segmentation, grounding) https://t.co/ob8FM0JrPM

0

27

164

Ziwei Liu

@liuziwei7

12 days

📸Our team has three #ICCV talks thanks to all the amazing workshop organizers @ICCVConference We now share the talk slides covering - Native Multimodal Model: https://t.co/dcdWk6007m - Reasoning in Generation: https://t.co/Uy7FxUQvd1 - Ego Intelligence: https://t.co/Z0xnv9dcDe

1

23

137

Wenhao Chai

@wenhaocha1

12 days

Back in 2024, LMMs-Eval built a complete evaluation ecosystem for the MLLM/LMM community, with countless researchers contributing their models and benchmarks to raise the whole edifice. I was fortunate to be one of them: our series of video-LMM works (MovieChat, AuroraCap, VDC)

Brian Bo Li

@BoLi68567011

14 days

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full

2

3

29

Ziwei Liu

@liuziwei7

13 days

🔥One-Stop Training Engine for Unified Models🔥 ⚡️LMMs-Engine⚡️ is a lean and flexible unified model training engine built for hacking at scale * Support multimodal inputs and outputs, from AR, diffusion and linear models, to unified models like BAGEL 🏠 https://t.co/x2CW8XZlRu

Brian Bo Li

@BoLi68567011

14 days

Throughout my journey in developing multimodal models, I’ve always wanted a framework that lets me plug & play modality encoders/decoders on top of an auto-regressive LLM. I want to prototype fast, try new architectures, and have my demo files scale effortlessly — with full

6

35

192

Ziqi Huang

@ziqi_huang_

14 days

Join us for The AI Talks S5E2 (SGT Oct 28 10 AM / PDT Oct 27 7 PM / EDT Oct 27 10 PM) with @thwiedemer to explore Veo 3’s zero-shot reasoning, and join the conversation on how video generation begins to mirror LLMs’ emerging reasoning abilities. #AI #Reasoning #VideoGeneration

The AI Talks

@TheAITalksOrg

14 days

🎙️ The AI Talks | S5E2 “Video Models Are Zero-Shot Learners and Reasoners” by Thaddäus Wiedemer (IMPRS-IS PhD, Google DeepMind). Can video models reason like LLMs? Join us to explore Veo 3’s zero-shot visual intelligence. 🕙 Oct 28 10 AM SG / Oct 27 10 PM Toronto #TheAITalks

0

6

20

The AI Talks

@TheAITalksOrg

14 days

🎙️ The AI Talks | S5E2 “Video Models Are Zero-Shot Learners and Reasoners” by Thaddäus Wiedemer (IMPRS-IS PhD, Google DeepMind). Can video models reason like LLMs? Join us to explore Veo 3’s zero-shot visual intelligence. 🕙 Oct 28 10 AM SG / Oct 27 10 PM Toronto #TheAITalks

1

12

18

Kaichen Zhang

@KaichenZhang358

14 days

🚀 Releasing LMMs Engine by EvolvingLMMs‑Lab — a lean, flexible framework for any-to-any modality pretraining & fine-tuning. 🔧 Built with cutting-edge optimizations: FSDP2, Ulysses Sequence Parallel, Flash Attention 2 📚 Dive in:

github.com

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale. - EvolvingLMMs-Lab/lmms-engine

1

10

73