Boyi Li Profile
Boyi Li

@Boyiliee

Followers
2K
Following
679
Media
25
Statuses
128

Joined March 2020
Don't wanna be here? Send us removal request.
@Boyiliee
Boyi Li
6 months
I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality!.🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human
4
36
197
@Boyiliee
Boyi Li
22 hours
RT @LongTonyLian: Excited to share that Describe Anything has been accepted at ICCV 2025! 🎉. Describe Anything Model (DAM) is a powerful Mu….
0
17
0
@Boyiliee
Boyi Li
8 days
RT @iamborisi: Happy to share our latest work on efficient sensor tokenization for end-to-end driving architectures! .
0
13
0
@Boyiliee
Boyi Li
13 days
RT @AntheaYLi: How to equip robot with super human sensory capabilities? .Come join us at RSS 2025 workshop, June21, on Multimodal Robotics….
0
3
0
@Boyiliee
Boyi Li
2 months
RT @_akhaliq: Nvidia just dropped Describe Anything on Hugging Face. Detailed Localized Image and Video Captioning
0
159
0
@Boyiliee
Boyi Li
2 months
RT @YinCuiCV: Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specif….
0
76
0
@Boyiliee
Boyi Li
3 months
👉🏻 We have released our code and benchmark data at  At #GTC 2025, we evaluated the safety and comfort of autonomous driving using Wolf: 
@Boyiliee
Boyi Li
11 months
🚀 Introducing 𝐖𝐨𝐥𝐟 🐺: a mixture-of-experts video captioning framework that outperforms GPT-4V and Gemini-Pro-1.5 in general scenes 🖼️, autonomous driving 🚗, and robotics videos 🤖. 👑:
Tweet media one
1
6
64
@Boyiliee
Boyi Li
3 months
Hallucination is a big challenge in video understanding for any single model. To address this, we introduce Wolf 🐺 (: a mixture-of-experts framework designed for accurate video understanding by distilling knowledge from various Vision-Language Models.
1
5
24
@Boyiliee
Boyi Li
3 months
4K Resolution! Vision is a critical part in building powerful multimodal foundation models. Super excited about this work.
@baifeng_shi
Baifeng
3 months
Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we
Tweet media one
2
2
48
@Boyiliee
Boyi Li
3 months
RT @baifeng_shi: Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today….
0
153
0
@Boyiliee
Boyi Li
3 months
RT @drmapavone: At #GTC2025, Jensen unveiled Halos, a comprehensive safety system for AVs and Physical AI. Halos integrates numerous techno….
0
12
0
@Boyiliee
Boyi Li
3 months
RT @_dmchan: 🚀 New Paper Alert! 🚀.Introducing TULIP 🌷 – a multimodal framework for richer vision-language understanding! A drop-in replacem….
0
12
0
@Boyiliee
Boyi Li
4 months
RT @drmapavone: For the first time ever, @nvidia is hosting an AV Safety Day at GTC - a multi-session workshop on AV safety. We will shar….
0
14
0
@Boyiliee
Boyi Li
4 months
Nice to see the progress in interactive task planning. It reminds me of our previous work, ITP, which incorporates both high-level planning and low-level function execution via language.
@chelseabfinn
Chelsea Finn
4 months
Can we prompt robots, just like we prompt language models?. With hierarchy of VLA models + LLM-generated data, robots can:.- reason through long-horizon tasks.- respond to variety of prompts.- handle situated corrections. Blog post & paper:
0
1
35
@Boyiliee
Boyi Li
5 months
RT @PointsCoder: Can Vision-Language Models (VLMs) truly understand the physical world? 🌍🔬. Introducing PhysBench – the first benchmark to….
0
74
0
@Boyiliee
Boyi Li
5 months
Our group at #NVIDIA has a few internship positions available. We welcome talented interns to join our efforts in autonomous driving and VLMs. If you're interested, please email me your CV.
8
28
385
@Boyiliee
Boyi Li
6 months
RT @_amirbar: @Boyiliee Wow, this is super cool, Boyi! Such a flashback to @carolinemchan and @shiryginosar's 'Everybody Dance Now' 😀.
0
1
0
@Boyiliee
Boyi Li
6 months
RT @drmapavone: Complementing DreamDrive, I am thrilled to introduce STORM, which enables fast scene reconstruction with a single feed-forw….
0
31
0
@Boyiliee
Boyi Li
6 months
RT @JitendraMalikCV: I'm happy to post course materials for my class at UC Berkeley "Robots that Learn", taught with the outstanding assist….
0
248
0
@Boyiliee
Boyi Li
6 months
RT @drmapavone: Introducing DreamDrive, which combines the complementary strengths of generative AI (video diffusion) and neural reconstruc….
0
44
0
@Boyiliee
Boyi Li
6 months
RT @JitendraMalikCV: Happy to share these exciting new results on video synthesis of humans in movement. Arguably, these establish the powe….
0
7
0