tanmay2099 Profile Banner
Tanmay Gupta Profile
Tanmay Gupta

@tanmay2099

Followers
2K
Following
773
Media
69
Statuses
260

Senior Research Scientist @allen_ai (Ai2) | Developing the science and art of multimodal AI agents | Prev. CS PhD, UIUC and EE UG, IIT Kanpur

Seattle, WA
Joined January 2014
Don't wanna be here? Send us removal request.
@tanmay2099
Tanmay Gupta
6 months
Love the evolution of this research thread: 2015 - Neural Module Networks (NMN) by @jacobandreas et al. was my introduction to neuro-symbolic reasoning in grad school. Super exciting approach but program synthesis and neural modules were both brittle back then. 2022 - GPT3 and
@codezakh
Zaid Khan
6 months
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of
0
8
30
@tanmay2099
Tanmay Gupta
1 month
If you are a near-graduation PhD student in computer vision, consider applying to the ICCV 2025 Doctoral Consortium (DC). It is a chance to be mentored by an experienced researcher in the vision community to help you transition to your post-PhD career in academia or industry.
@ICCVConference
#ICCV2025
1 month
Finishing your PhD or just defended? Apply to the #ICCV2025 Doctoral Consortium. Get feedback and mentorship from leading researchers in computer vision.
Tweet media one
0
6
33
@tanmay2099
Tanmay Gupta
3 months
This morning took a scenic walk from Ai2’s (@allen_ai) past to its future! Reminds me of this wonderful feeling of day 1 as an intern at a new and shiny office - no idea where anything is anymore! 🤩
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
22
@tanmay2099
Tanmay Gupta
5 months
Context for that 3rd point:
@tanmay2099
Tanmay Gupta
6 months
Love the evolution of this research thread: 2015 - Neural Module Networks (NMN) by @jacobandreas et al. was my introduction to neuro-symbolic reasoning in grad school. Super exciting approach but program synthesis and neural modules were both brittle back then. 2022 - GPT3 and
0
0
0
@tanmay2099
Tanmay Gupta
5 months
Some reflections on neuro-modular approaches like VisProg/ViperGPT (and their various agentic incarnations) in context of today’s models: 1. End-to-end (E2E) vs neuro-modular (NM) is a false dichotomy We don’t need to replace E2E models that work well with NM systems but we can
@sainingxie
Saining Xie
5 months
Some further thoughts on the idea of "thinking with images": 1) zero-shot tool use is limited -- you can’t just call an object detector to do visual search. That’s why approaches like VisProg/ViperGPT/Visual-sketchpad will not generalize or scale well. 2) visual search needs to
Tweet media one
2
6
26
@tanmay2099
Tanmay Gupta
5 months
Great initiative by #CVPR2025! Kudos to Alyosha and Antonio for volunteering to run these practice sessions 👏👏
Tweet media one
1
0
18
@LucaWeihs
Luca Weihs
6 months
Loved working with Zaid as he led this exciting project at Ai2! LLM-based coding agents are remarkably capable when given well-grounded plans but generating such plans from arbitrarily large code-bases is extremely challenging to do efficiently, the solution: MutaGReP
@codezakh
Zaid Khan
6 months
✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of
0
5
21
@_akhaliq
AK
6 months
MutaGReP Execution-Free Repository-Grounded Plan Search for Code-Use
2
11
44
@YueYangAI
Yue Yang
7 months
We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: https://t.co/9IQ4CgeKMF Dataset: https://t.co/yiERrZup8X Paper:
Tweet media one
6
47
196
@tanmay2099
Tanmay Gupta
9 months
@anand_bhattad @SkyLi0n @jon_barron @anikembhavi (Thanks for the shoutout @anand_bhattad !) or CodeNav which generalizes tool-use to code-use! Some key improvements upon VisProg / ViperGPT style tool-use systems: ✅ Its way more flexible in how tools are provided (just build a python codebase and point the LLM to that
0
1
4
@jmin__cho
Jaemin Cho
9 months
🚨 I’m on the 2024-2025 academic job market! https://t.co/NKVe24PSOl I work on ✨ Multimodal AI ✨, with a special focus on enhancing reasoning in both understanding and generation tasks by: 1⃣Making it more scalable 2⃣Making it more faithful 3⃣Evaluating and refining multimodal
Tweet media one
6
42
218
@_jessethomason_
Jesse Thomason
9 months
I missed this post back in JULY when Tanmay made it but it's prescient and even more relevant now. Ccore NLP folks, remember not to re-invent the wheel. Agents are a thing in robotics and reinforcement learning and planning. We have algorithms! Come chat with us!
@tanmay2099
Tanmay Gupta
1 year
Do we need to narrowly redefine "Agent" for LLM-Agents or can we just borrow a broader definition from RL / Embodied AI literature? LLM Agents are agentic in the same sense that a trained robot or an RL policy is agentic. Making this connection more explicit allows us to borrow
3
3
16
@unnatjain2010
Unnat Jain
10 months
Excited to share that I'll be joining University of California at Irvine as a CS faculty in '25!🌟 Faculty apps: @_krishna_murthy, @liuzhuang1234 & I share our tips: https://t.co/vynFYIJMQA PhD apps: I'm looking for students in vision, robot learning, & AI4Science. Details👇
Tweet media one
38
72
392
@tanmay2099
Tanmay Gupta
10 months
I am hiring interns to join us @allen_ai in advancing the science and art of building agents of all kinds: 🕸️ Web-use 💻 Code-use 🛠️ Tool-use Join us in answering exciting questions about multimodal planning, agentic learning, dealing with underspecified queries and more!
@Ai2Prior
Prior @ AI2
10 months
📢Applications are open for summer'25 internships at the PRIOR (computer vision) team @allen_ai: Come join us in building large-scale models for: 📸 Open-source Vision-Language Models 💻 Multimodal Web Agents 🤖 Embodied AI + Robotics 🌎 Planet Monitoring Apply by December
0
7
46
@ZCCZHANG
Zichen "Charles" Zhang
10 months
We won the outstanding paper award @corl_conf !!! 😀😀😀And here’s what’s inside that mysterious big box
Tweet media one
Tweet media two
@KuoHaoZeng
Kuo-Hao Zeng
10 months
🚀 Quick Update 🚀 🎉 @ZCCZHANG will present PoliFormer at CoRL Oral Session 5 (🕤 9:30-10:30, Fri, Nov 8, CET)! 🎉 Meet us at Poster Session 4 (🕓 16:00-17:30) to chat with @ZCCZHANG, @rosemhendrix, and Jordi! 💻 Our code & checkpoints are NOW public:
1
4
22
@KuoHaoZeng
Kuo-Hao Zeng
10 months
Incredibly honored to share this amazing news! PoliFormer has won the Outstanding Paper Award at @corl_conf 2024! 🎉 Check out our project and code: https://t.co/XJRjZnTVWM
Tweet media one
@anikembhavi
Ani Kembhavi
10 months
PoliFormer has won the Outstanding Paper Award at @corl_conf 2024! On policy RL with a modern transformer architecture can produce masterful navigators for multiple embodiments. All Sim-to-Real. A last hurrah from work at @allen_ai ! Led by @KuoHaoZeng @ZCCZHANG and @LucaWeihs
Tweet media one
2
7
78
@tanmay2099
Tanmay Gupta
1 year
This is how we do POS tagging in 2024, right? Jokes aside, the model is actually really good at pointing. Check it out yourself!
Tweet media one
Tweet media two
@allen_ai
Ai2
1 year
Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it
0
4
20
@tanmay2099
Tanmay Gupta
1 year
5. In summary: Besides the LLM itself, the design of the environment, observations, and the action space are 3 powerful but often under-explored degrees of freedom that can help you improve the performance, robustness, and implementation of your LLM-Agent. Check out CodeNav for
0
0
4
@tanmay2099
Tanmay Gupta
1 year
4. So how does this improve your implementation? With a clean division of responsibility among a few components/classes. The pseudo-code below shows the general structure:
Tweet media one
1
0
2
@tanmay2099
Tanmay Gupta
1 year
3.3 What is your action space? 🤔 Your action space constrains how your agent interacts with the environment. LLM outputs that do not satisfy these constraints are invalid and must be rejected and resampled. Examples of action spaces: - web agents: clicking, scrolling, typing
Tweet media one
1
0
0