Tianyuan Dai
@RogerDai1217
Followers
280
Following
15
Media
5
Statuses
33
PhD @UTCompSci advised by @yukez | Research Intern @NVIDIA GEAR Lab | Prev: CS MS @Stanford advised by @drfeifei | 🤖 Robot Learning, Sim2Real, Real2Sim
Stanford, CA
Joined September 2024
Excited to see the great work done. Congrats! @yingke_wang18 @RuohanZhang76
1/N 🎨🤖Given only a static image of an oil painting by an expert artist, can a robot infer the corresponding control actions, such as trajectory, orientation, and applied force, to accurately reproduce the painting? 🖌️Introducing IMPASTO: a robotic oil-painting system that
1
0
2
Intelligent humanoids should have the ability to quickly adapt to new tasks by observing humans Why is such adaptability important? 🌍 Real-world diversity is hard to fully capture in advance 🧠 Adaptability is central to natural intelligence We present MimicDroid 👇 🌐
7
40
121
I’m at #CoRL2025 in Seoul this week! I’m looking for students to join my lab next year, and also for folks excited to build robotic foundation models at a startup. If you’re into generalization, planning and reasoning, or robots that use language, let's chat!
2
8
50
Looking for a PhD position? Apply to the @ELLISforEurope PhD program and get the unique opportunity to work with two different research teams across Europe! Apply by 31 Oct:
🎓 Interested in a #PhD in machine learning or #AI? The ELLIS PhD Program connects top students with leading researchers across Europe. The application portal opens on Oct 1st. Curious? Join our info session on the same day. Get all the info 👉 https://t.co/0Tq58uexHk
#ELLISPhD
0
3
35
10 years ago, deep learning was in its infancy. PyTorch didn't exist. Language models were recurrent, and not large. But it felt important: a new technology that would change everything. That's why @drfeifei , @karpathy, and I started @cs231n back in 2015 - to teach the world's
youtube.com
Computer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving car...
41
217
2K
🔥 Deadline extended! The non-archival track is now open until Aug 17. Have research related to digital twins? Consider submitting it to our workshop at @ICCVConference 2025. Previously accepted or published papers are welcome as well. #ICCV2025
📢 Call for Papers - We are organizing @ICCVConference Workshop on Generating Digital Twins from Images and Videos (gDT-IV) at #ICCV2025! We welcome submissions in two tracks: 📅 Deadline for Archival Paper Track: June 27 ⏰ Deadline for Non-Archival Paper Track: July 31 🌐
5
10
34
Enabling robots to improve autonomously via RL will be powerful, and dense shaping rewards can greatly facilitate RL. Our #IROS2025 paper presents a method leveraging VLMs to derive dense rewards for efficient autonomous RL. ⚡🦾 #Robotics #ReinforcementLearning 🧵1/5
4
12
120
Not all demos are created equal🧐. CUPID uses influence functions to curate only the data that truly drives policy success. If you are interested in data curation methods, this paper definitely worths reading. Congratulations to the team! @agiachris @leto__jean
What makes data “good” for robot learning? We argue: it’s the data that drives closed-loop policy success! Introducing CUPID 💘, a method that curates demonstrations not by "quality" or appearance, but by how they influence policy behavior, using influence functions. (1/6)
0
0
2
I've been a bit quiet on X recently. The past year has been a transformational experience. Grok-4 and Kimi K2 are awesome, but the world of robotics is a wondrous wild west. It feels like NLP in 2018 when GPT-1 was published, along with BERT and a thousand other flowers that
188
328
4K
How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the
46
259
2K
🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️
4
34
137
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust
18
96
431
Intriguing work on scaling up visual affordance prediction. 🥳 Definitely checkout UAD if you are interested! 🤩
How to scale visual affordance learning that is fine-grained, task-conditioned, works in-the-wild, in dynamic envs? Introducing Unsupervised Affordance Distillation (UAD): distills affordances from off-the-shelf foundation models, *all without manual labels*. Very excited this
0
1
5
Very interesting work in reconstructing digital twins from in-the-wild video. It has great potential for robotics applications, possibly enhancing sim-to-real transfer with ''controllable randomization''
I've been wanting to make 3D reconstructions not just realistic, but also **interactable** and **actionable** for years. Thanks to @XHongchi97338, we're now a step closer! Introducing DRAWER — a framework for the automatic construction of realistic, interactive digital twins.
0
0
1
How to use simulation data for real-world robot manipulation? We present sim-and-real co-training, a simple recipe for manipulation. We demonstrate that sim data can significantly enhance real-world performance, even with notable differences between the sim and the real. (1/n)
3
54
245
These days, it feels like a new robotic hand is developed every week. But how can we give them all vision-based general grasping ability? Meet AnyDexGrasp: it enables robust grasping across different robot hands—with just 40 training objects and 4–8 hours of trial and error!
8
40
217
🤖 Ever wondered what robots need to truly help humans around the house? 🏡 Introducing 𝗕𝗘𝗛𝗔𝗩𝗜𝗢𝗥 𝗥𝗼𝗯𝗼𝘁 𝗦𝘂𝗶𝘁𝗲 (𝗕𝗥𝗦)—a comprehensive framework for mastering mobile whole-body manipulation across diverse household tasks! 🧹🫧 From taking out the trash to
18
139
419
In the past, we extended the convolution operator to go from low-level image processing to high-level visual reasoning. Can we also extend physical operators for more high-level physical reasoning? Introducing the Denoising Hamiltonian Network (DHN): https://t.co/GY76QreRge
6
59
315
Ever wondered how roses grow and wither in your backyard?🌹 Our latest work on generating 4D temporal object intrinsics lets you explore a rose's entire lifecycle—from birth to death—under any environment light, from any viewpoint, at any moment. Project page:
4
38
190
1/ [NeurIPS D&B] Introducing HourVideo: A benchmark for hour-long video-language understanding!🚀 500 egocentric videos, 18 total tasks & ~13k questions! Performance: GPT-4➡️25.7% Gemini 1.5 Pro➡️37.3% Humans➡️85.0% We highlight a significant gap in multimodal capabilities🧵👇
3
54
185