Jiajun Wu Profile
Jiajun Wu

@jiajunwu_cs

Followers
11K
Following
288
Media
0
Statuses
171

Assistant Professor at @Stanford CS

Stanford, CA
Joined March 2009
Don't wanna be here? Send us removal request.
@drfeifei
Fei-Fei Li
14 days
(1/N) How close are we to enabling robots to solve the long-horizon, complex tasks that matter in everyday life? 🚨 We are thrilled to invite you to join the 1st BEHAVIOR Challenge @NeurIPS 2025, submission deadline: 11/15. 🏆 Prizes: 🥇 $1,000 🥈 $500 🥉 $300
39
263
1K
@wayfarerlabs
OWL
2 months
🌟Got multiple expert models and want them to steer your image/video generation? We’ve re-implemented the Product of Experts for Visual Generation paper on a toy example, and broken it down step by step in our new blog post! Includes: - Github repo: Annealed Importance
Tweet media one
Tweet media two
2
10
21
@ManlingLi_
Manling Li
3 months
Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐 https://t.co/ER5UX284Vo 📰 https://t.co/hGZerOb0TM
5
60
281
@KyleSargentAI
Kyle Sargent
3 months
FlowMo, our paper on diffusion autoencoders for image tokenization, has been accepted to #ICCV2025! See you in Hawaii! 🏄‍♂️
@KyleSargentAI
Kyle Sargent
6 months
Modern generative models of images and videos rely on tokenizers. Can we build a state-of-the-art discrete image tokenizer with a diffusion autoencoder? Yes! I’m excited to share FlowMo, with @kylehkhsu, @jcjohnss, @drfeifei, @jiajunwu_cs. A thread 🧵:
1
15
92
@Koven_Yu
Hong-Xing "Koven" Yu
3 months
#ICCV2025 🤩3D world generation is cool, but it is cooler to play with the worlds using 3D actions 👆💨, and see what happens! — Introducing *WonderPlay*: Now you can create dynamic 3D scenes that respond to your 3D actions from a single image! Web: https://t.co/uFOzA8t0P8 🧵1/7
6
40
182
@sanjana__z
Sanjana Srivastava
3 months
🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️
4
33
134
@zhang_yunzhi
Yunzhi Zhang
3 months
(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.
5
65
303
@flycooler_zd
Zhao Dong
3 months
🚀 Excited to announce our CVPR 2025 Workshop: 3D Digital Twin: Progress, Challenges, and Future Directions 🗓 June 12, 2025 · 9:00 AM–5:00 PM 📢 Incredible lineup: @rapideRobot, Andrea Vedaldi @Oxford_VGG,@richardzhangsfu,@QianqianWang5,Dr. Xiaoshuai Zhang @Hillbot_AI,
Tweet media one
2
22
57
@ndea
Ndea
4 months
Neuro-symbolic concepts (object, action, relation) represented by a hybrid of neural nets & symbolic programs. Composable, grounded, and typed, agents recombine them to solve tasks like robotic manipulation. J. Tenenbaum @maojiayuan @jiajunwu_cs @MIT https://t.co/2r0cEvvqZx
Tweet media one
1
9
59
@AlexHe00880585
Guangzhao (Alex) He
3 months
💫 Animating 4D objects is complex: traditional methods rely on handcrafted, category-specific rigging representations. 💡 What if we could learn unified, category-agnostic, and scalable 4D motion representations — from raw, unlabeled data? 🚀 Introducing CANOR at #CVPR2025: a
2
22
96
@wenlong_huang
Wenlong Huang
4 months
How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this
8
109
438
@joycjhsu
Joy Hsu
5 months
We'll be presenting Deep Schema Grounding at @iclr_conf 🇸🇬 on Thursday (session 1 #98). Come chat about abstract visual concepts, structured decomposition, & what makes a maze a maze! & test your models on our challenging Visual Abstractions Benchmark:
@joycjhsu
Joy Hsu
1 year
What makes a maze look like a maze? Humans can reason about infinitely many instantiations of mazes—made of candy canes, sticks, icing, yarn, etc. But VLMs often struggle to make sense of such visual abstractions. We improve VLMs' ability to interpret these abstract concepts.
1
3
39
@StanfordAILab
Stanford AI Lab
5 months
Stanford AI Lab (SAIL) is excited to announce new SAIL Postdoctoral Fellowships! We are looking for outstanding candidates excited to advance the frontiers of AI with our professors and vibrant community. Applications received by the end of April 30 will receive full
6
83
208
@Koven_Yu
Hong-Xing "Koven" Yu
5 months
🔥Spatial intelligence requires world generation, and now we have the first comprehensive evaluation benchmark📏 for it! Introducing WorldScore: Unifying evaluation for 3D, 4D, and video models on world generation! 🧵1/7 Web: https://t.co/WnKPf8uarw arxiv: https://t.co/EPLM1xTLwP
6
91
244
@dyamins
Daniel Yamins
6 months
New paper on self-supervised optical flow and occlusion estimation from video foundation models. @sstj389 @jiajunwu_cs @SeKim1112 @Rahul_Venkatesh https://t.co/5j4zjLIxNZ @
3
18
111
@ManlingLi_
Manling Li
5 months
Introducing T* and LV-Haystack -- targeting needle-in-the-haystack for long videos! 🤗 LV-Haystack annotated 400+ hours of videos and 15,000+ samples. 🧩 Lightweight plugin for any proprietary and open-source VLMs: T* boosting LLaVA-OV-72B [56→62%] and GPT-4o [50→53%] within
Tweet media one
Tweet media two
4
19
89
@Koven_Yu
Hong-Xing "Koven" Yu
5 months
🤩 FluidNexus has been selected as CVPR'25 *Oral* paper 🎺! See you at Nashville!
@Koven_Yu
Hong-Xing "Koven" Yu
6 months
🔥Want to capture 3D dancing fluids♨️🌫️🌪️💦? No specialized equipment, just one video! Introducing FluidNexus: Now you only need one camera to reconstruct 3D fluid dynamics and predict future evolution! 🧵1/4 Web: https://t.co/DsxWBo8pgX Arxiv: https://t.co/U1O8qpXycH
0
4
45
@KyleSargentAI
Kyle Sargent
6 months
Modern generative models of images and videos rely on tokenizers. Can we build a state-of-the-art discrete image tokenizer with a diffusion autoencoder? Yes! I’m excited to share FlowMo, with @kylehkhsu, @jcjohnss, @drfeifei, @jiajunwu_cs. A thread 🧵:
13
140
599
@sunfanyun
Fan-Yun Sun
6 months
Spatial reasoning is a major challenge for the foundation models today, even in simple tasks like arranging objects in 3D space. #CVPR2025 Introducing LayoutVLM, a differentiable optimization framework that uses VLM to spatially reason about diverse scene layouts from unlabeled
4
60
240
@3DVconf
International Conference on 3D Vision
6 months
#3DV2025 is happening in 10 days in Singapore, but we can't wait to give some spoilers for the award!! 8 papers were selected as award candidates, congrats 🥳. The final awards will be announced during the main conference. https://t.co/e8e546AH7B
Tweet media one
2
6
47