ayushjain1144 Profile Banner
Ayush Jain Profile
Ayush Jain

@ayushjain1144

Followers
418
Following
3K
Media
20
Statuses
214

Robotics PhD Student, CMU | MS in Robotics, CMU | B.E. CS, BITS Pilani | 🇮🇳

Pittsburgh, PA
Joined May 2018
Don't wanna be here? Send us removal request.
@ayushjain1144
Ayush Jain
7 months
1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMs—due to the lack of large-scale 3D data and pre-trained 3D encoders. We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. https://t.co/DGGtYYPaQi
1
28
136
@jon_barron
Jon Barron
25 days
It looks like @CVPR has implemented a new mandatory "Compute Reporting Form" that must be submitted alongside any paper submission. Though I am sympathetic to the motivations for this change, I am opposed to it for a variety of reasons:
@CVPR
#CVPR2026
26 days
#AI research has an invisible cost: compute Starting with #CVPR2026, authors will report their compute usage. Aggregated data will help the community understand who can participate, what is sustainable, and how resources are used, promoting more transparent & equitable research.
3
32
225
@ayushjain1144
Ayush Jain
1 month
Happy to be on this list! 🙂
@ICCVConference
#ICCV2025
1 month
There’s no conference without the efforts of our reviewers. Special shoutout to our #ICCV2025 outstanding reviewers 🫡 https://t.co/WYAcXLRXla
0
0
11
@Nik__V__
Nikhil Keetha
2 months
Meet MapAnything – a transformer that directly regresses factored metric 3D scene geometry (from images, calibration, poses, or depth) in an end-to-end way. No pipelines, no extra stages. Just 3D geometry & cameras, straight from any type of input, delivering new state-of-the-art
29
129
722
@ayushjain1144
Ayush Jain
3 months
Checkout this amazing new work from @yehonation!
@yehonation
Yehonathan Litman
3 months
#ICCV2025 Introducingđź’ˇLightSwitchđź’ˇ- A multi-view material-relighting diffusion pipeline that directly and efficiently relights any number of input images to a target lighting & do 3D asset relighting with gaussian splatting! đź§µ
1
0
3
@mihirp98
Mihir Prabhudesai
3 months
In RENT, we showed LLMs can improve without access to answers - by maximizing confidence. In this work, we go further: LLMs can improve without even having the questions. Using self-play, one LLM learns to ask challenging questions, while other LLM uses confidence to solve them
@lchen915
Lili
3 months
Self-Questioning Language Models: LLMs that learn to generate their own questions and answers via asymmetric self-play RL. There is no external training data – the only input is a single prompt specifying the topic.
0
5
21
@GabrielSarch
Gabriel Sarch
4 months
Couldn’t be at #ACL2025NLP, but check out our ACL paper from @MSFTResearch! We study how implicit cues in video demos (eye gaze & speech) impact personalized assistance in VLMs. TL;DR: - RGB + gaze > RGB alone - Gaze vs. speech impact is task-specific 📄 https://t.co/r9WMVidmaC
7
9
67
@DrJimFan
Jim Fan
4 months
I'm observing a mini Moravec's paradox within robotics: gymnastics that are difficult for humans are much easier for robots than "unsexy" tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, "so, robots can parkour &
145
615
3K
@ayushjain1144
Ayush Jain
4 months
Great work from Mihir with lots of nice insights in the thread!
@mihirp98
Mihir Prabhudesai
4 months
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
0
1
7
@savvyRL
Rosanne Liu
4 months
We are HALFWAY there! Thanks to all those who've kindly contributed 🙏🙏 With Indaba <4 weeks away, let's send all the 25 African researchers to their dream conference! Donate what you can: https://t.co/ryCItIoxNs
@savvyRL
Rosanne Liu
4 months
The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.
1
54
76
@Michael_J_Black
Michael Black
4 months
@svlevine Good article. I have three comments: 1. With any hard optimization problem, if you can get into the right ballpark, you save a lot of time searching around. I think that's where human demonstration really helps. 2. When a human watches Roger Federer, they get the gist of what
2
4
47
@ayushjain1144
Ayush Jain
4 months
Happening now!
@o_maksymets
Oleksandr Maksymets
4 months
On #ICML2025 16 Jul, 11 AM We present Meta Locate 3D: a model for accurate object localization in 3D environments. Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. Demo, model, paper: https://t.co/8ZhV21TDxq
0
0
6
@ayushjain1144
Ayush Jain
4 months
Happening right now!!
@AngCao3
Ang Cao
4 months
Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? @iamsashasax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: https://t.co/MVX6EvS4t4
0
0
5
@AngCao3
Ang Cao
4 months
Can we train a 3D-language multimodality Transformer using 2D VLMs and rendering loss? @iamsashasax will present our new #icml25 paper on Wednesday 2pm at Hall B2-B3 W200. Please come and check! Project Page: https://t.co/MVX6EvS4t4
0
21
133
@_jasonwei
Jason Wei
4 months
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s
127
347
3K
@o_maksymets
Oleksandr Maksymets
4 months
On #ICML2025 16 Jul, 11 AM We present Meta Locate 3D: a model for accurate object localization in 3D environments. Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. Demo, model, paper: https://t.co/8ZhV21TDxq
5
15
54
@ayushjain1144
Ayush Jain
4 months
I'll be at #ICML2025 to present UniVLG! Excited to meet old friends and make new ones, especially people working in the Indian research ecosystem. Feel free to reach out if you would like to chat!
@ayushjain1144
Ayush Jain
7 months
1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMs—due to the lack of large-scale 3D data and pre-trained 3D encoders. We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding. https://t.co/DGGtYYPaQi
0
1
15
@savvyRL
Rosanne Liu
4 months
The opportunity gap in AI is more striking than ever. We talk way too much about those receiving $100M or whatever for their jobs, but not enough those asking for <$1k to present their work. For 3rd year in a row, @ml_collective is raising funds to support @DeepIndaba attendees.
16
120
236
@itsbautistam
Miguel Angel Bautista
5 months
We have an open position at Apple MLR to work scalable and efficient generative models that perform across diverse data domains—including images, 3D, video, graphs, etc. We care deeply about simplifying modeling pipelines, developing powerful and scalable training recipes.
2
14
65
@AdamWHarley
Adam W. Harley
5 months
AllTracker: Efficient Dense Point Tracking at High Resolution If you're using any point tracker in any project, this is likely a drop-in upgrade—improving speed, accuracy, and density, all at once.
2
38
240