Ayush Jain @ayushjain1144 X Profile

Ayush Jain

@ayushjain1144

Followers

418

Following

3K

Media

20

Statuses

214

Robotics PhD Student, CMU | MS in Robotics, CMU | B.E. CS, BITS Pilani | 🇮🇳

https://t.co/SOxp93H41p

Pittsburgh, PA

Joined May 2018

Don't wanna be here? Send us removal request.

Ayush Jain

@ayushjain1144

7 months

1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMs—due to the lack of large-scale 3D data and pre-trained 3D encoders. We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. https://t.co/DGGtYYPaQi

1

28

136

Jon Barron

@jon_barron

25 days

It looks like @CVPR has implemented a new mandatory "Compute Reporting Form" that must be submitted alongside any paper submission. Though I am sympathetic to the motivations for this change, I am opposed to it for a variety of reasons:

#CVPR2026

@CVPR

26 days

#AI research has an invisible cost: compute Starting with #CVPR2026, authors will report their compute usage. Aggregated data will help the community understand who can participate, what is sustainable, and how resources are used, promoting more transparent & equitable research.

3

32

225

Ayush Jain

@ayushjain1144

1 month

Happy to be on this list! 🙂

#ICCV2025

@ICCVConference

1 month

There’s no conference without the efforts of our reviewers. Special shoutout to our #ICCV2025 outstanding reviewers 🫡 https://t.co/WYAcXLRXla

0

11

Nikhil Keetha

@Nik__V__

2 months

Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art

29

129

722

Ayush Jain

@ayushjain1144

3 months

Checkout this amazing new work from @yehonation!

Yehonathan Litman

@yehonation

3 months

#ICCV2025 Introducing💡LightSwitch💡- A multi-view material-relighting diffusion pipeline that directly and efficiently relights any number of input images to a target lighting & do 3D asset relighting with gaussian splatting! 🧵

1

0

3

Mihir Prabhudesai

@mihirp98

3 months

In RENT, we showed LLMs can improve without access to answers - by maximizing confidence. In this work, we go further: LLMs can improve without even having the questions. Using self-play, one LLM learns to ask challenging questions, while other LLM uses confidence to solve them

Lili

@lchen915

3 months

Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.

0

5

21

Gabriel Sarch

@GabrielSarch

4 months

Couldn’t be at #ACL2025NLP, but check out our ACL paper from @MSFTResearch! We study how implicit cues in video demos (eye gaze & speech) impact personalized assistance in VLMs. TL;DR: - RGB + gaze > RGB alone - Gaze vs. speech impact is task-specific 📄 https://t.co/r9WMVidmaC

7

9

67

Jim Fan

@DrJimFan

4 months

I'm observing a mini Moravec's paradox within robotics: gymnastics that are difficult for humans are much easier for robots than "unsexy" tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, "so, robots can parkour &

145

615

3K

Ayush Jain

@ayushjain1144

4 months

Great work from Mihir with lots of nice insights in the thread!

Mihir Prabhudesai

@mihirp98

4 months

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

0

1

7

Rosanne Liu

@savvyRL

4 months

We are HALFWAY there! Thanks to all those who've kindly contributed 🙏🙏 With Indaba <4 weeks away, let's send all the 25 African researchers to their dream conference! Donate what you can: https://t.co/ryCItIoxNs

Rosanne Liu

@savvyRL

4 months

The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.

1

54

76

Michael Black

@Michael_J_Black

4 months

@svlevine Good article. I have three comments: 1. With any hard optimization problem, if you can get into the right ballpark, you save a lot of time searching around. I think that's where human demonstration really helps. 2. When a human watches Roger Federer, they get the gist of what

2

4

47

Ayush Jain

@ayushjain1144

4 months

Happening now!

Oleksandr Maksymets

@o_maksymets

4 months

On #ICML2025 16 Jul, 11 AM We present Meta Locate 3D: a model for accurate object localization in 3D environments. Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. Demo, model, paper: https://t.co/8ZhV21TDxq

0

6

Ayush Jain

@ayushjain1144

4 months

Happening right now!!

Ang Cao

@AngCao3

4 months

Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? @iamsashasax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: https://t.co/MVX6EvS4t4

0

5

Ang Cao

@AngCao3

4 months

Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? @iamsashasax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: https://t.co/MVX6EvS4t4

0

21

133

Jason Wei

@_jasonwei

4 months

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s

127

347

3K

Oleksandr Maksymets

@o_maksymets

4 months

On #ICML2025 16 Jul, 11 AM We present Meta Locate 3D: a model for accurate object localization in 3D environments. Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. Demo, model, paper: https://t.co/8ZhV21TDxq

5

15

54

Ayush Jain

@ayushjain1144

4 months

I'll be at #ICML2025 to present UniVLG! Excited to meet old friends and make new ones, especially people working in the Indian research ecosystem. Feel free to reach out if you would like to chat!

Ayush Jain

@ayushjain1144

7 months

1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMs—due to the lack of large-scale 3D data and pre-trained 3D encoders. We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. https://t.co/DGGtYYPaQi

0

1

15

Rosanne Liu

@savvyRL

4 months

The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.

16

120

236

Deepti

@deeptigp

5 months

On turning one as a faculty:

medium.com

It’s been one year since I have plunged into the deep waters of academia. It was very daunting this day last year and it is still daunting…

0

3

27

Miguel Angel Bautista

@itsbautistam

5 months

We have an open position at Apple MLR to work scalable and efficient generative models that perform across diverse data domains—including images, 3D, video, graphs, etc. We care deeply about simplifying modeling pipelines, developing powerful and scalable training recipes.

2

14

65

Adam W. Harley

@AdamWHarley

5 months

AllTracker: Efficient Dense Point Tracking at High Resolution If you're using any point tracker in any project, this is likely a drop-in upgrade—improving speed, accuracy, and density, all at once.

2

38

240