Fan-Yun Sun Profile
Fan-Yun Sun

@sunfanyun

Followers
1K
Following
167
Media
43
Statuses
138

cs phd @StanfordAILab @stanfordsvl @NVIDIAAI embodied AI, code generation, 3D

Stanford, CA
Joined October 2018
Don't wanna be here? Send us removal request.
@sunfanyun
Fan-Yun Sun
5 months
Spatial reasoning is a major challenge for the foundation models today, even in simple tasks like arranging objects in 3D space. #CVPR2025 .Introducing LayoutVLM, a differentiable optimization framework that uses VLM to spatially reason about diverse scene layouts from unlabeled
4
59
240
@sunfanyun
Fan-Yun Sun
2 months
RT @0xSigil: Hiring a founding principal engineer for @extraordinary. $25,000 referral
Tweet media one
0
13
0
@grok
Grok
6 days
What do you want to know?.
487
310
2K
@sunfanyun
Fan-Yun Sun
2 months
RT @sanjana__z: 🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, uncons….
0
32
0
@sunfanyun
Fan-Yun Sun
3 months
RT @wenlong_huang: How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs?. Int….
0
108
0
@sunfanyun
Fan-Yun Sun
4 months
RT @silasalberti: we trained Kevin-32B = K(ernel D)evin using GRPO on KernelBench. it's to our knowledge the first open model trained using….
0
28
0
@sunfanyun
Fan-Yun Sun
4 months
RT @sidahuj: 🧑‍🎨 The future of creative tools will look very different. 🧠 Imagine an AI control-centre for orchestrating complex tasks usi….
0
39
0
@sunfanyun
Fan-Yun Sun
4 months
RT @sidahuj: 🧩 Built an MCP that lets Claude talk directly to Blender. It helps you create beautiful 3D scenes using just prompts!. Here’s….
0
1K
0
@sunfanyun
Fan-Yun Sun
5 months
Huge thanks to the amazing team: @Weiyu_Liu_ (co-lead), Siyi Gu, @dill_pkl , Goutam Bhat, @fedassa , @ManlingLi_ , @nickhaber , @jiajunwu_cs . 🌐Project site: 💻 Code (we plan to open-source everything):. n/n.
Tweet card summary image
github.com
Official code for "LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models" (CVPR 2025) - sunfanyunn/LayoutVLM
2
2
10
@sunfanyun
Fan-Yun Sun
5 months
Beyond research, consider this: Rockstar Games spends $100M+ and countless human hours meticulously placing 3D assets to create immersive game worlds like GTA. When combined with asset generation models, a model that can spatially reason could automate content creation for
Tweet media one
1
1
9
@sunfanyun
Fan-Yun Sun
5 months
Automated 3D layout generation unlocks richer simulation environments for robotics and embodied AI, enabling:.🔹 More realistic scenes and layouts during training .🔹 Improved generalization for real-world deployment. Consider scene_synthesizer by @clembow, which shares a similar
1
1
5
@sunfanyun
Fan-Yun Sun
5 months
LayoutVLM outperforms existing methods in our benchmark, where models arrange up to *80* 3D assets given a language instruction and a floor plan. 5/n
1
1
5
@sunfanyun
Fan-Yun Sun
5 months
The 3D layout optimization landscape is full of local minima—how can we escape them?.🔹 We refine the optimization objectives by validating them against the predicted numerical initialization (code is verifiable!). 🔹 We further finetune our VLM on human-designed 3D scene
1
1
3
@sunfanyun
Fan-Yun Sun
5 months
Our key idea: Use a VLM to produce two complementary representations and enforce mutual consistency for better spatial reasoning. 🔹 Initialization: predict numerical poses from visually marked multi-view images.🔹 Optimization: generate spatial relations as differentiable
Tweet media one
1
1
3
@sunfanyun
Fan-Yun Sun
5 months
Due to the lack of 3D and dimensional awareness in LLMs, existing methods struggle to generate scenes that are.🔹physically plausible (i.e., no collision).🔹semantically aligned (i.e., objects are placed meaningfully according to the language instruction). 2/n
Tweet media one
1
1
6
@sunfanyun
Fan-Yun Sun
6 months
Claude 3.7 one-shotted this 3D room— everything from layout to geometry and texture. This is always a fun way to test a model’s spatial reasoning ability
7
11
122
@sunfanyun
Fan-Yun Sun
6 months
RT @YueYangAI: We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images,….
0
47
0
@sunfanyun
Fan-Yun Sun
6 months
6. Discount the projection by additional risk factors such as AI not getting there, regulation pushback
Tweet media one
1
0
1
@sunfanyun
Fan-Yun Sun
6 months
5. Payback analysis to justify adoption. For example, in fast food, an annual FTE cost is $167,328, including wages ($131,040 at $15/hr), benefits/taxes (20%), and training costs, suggesting humanoids could be cost-competitive with a rental model. "Estimated cost savings of.
1
0
1