Tanmay Gupta @tanmay2099 X Profile

Tanmay Gupta

@tanmay2099

Followers

2K

Following

773

Media

69

Statuses

260

Senior Research Scientist @allen_ai (Ai2) | Developing the science and art of multimodal AI agents | Prev. CS PhD, UIUC and EE UG, IIT Kanpur

https://t.co/8iRarIv5r7

Seattle, WA

Joined January 2014

Don't wanna be here? Send us removal request.

Tanmay Gupta

@tanmay2099

6 months

Love the evolution of this research thread: 2015 - Neural Module Networks (NMN) by @jacobandreas et al. was my introduction to neuro-symbolic reasoning in grad school. Super exciting approach but program synthesis and neural modules were both brittle back then. 2022 - GPT3 and

Zaid Khan

@codezakh

6 months

✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of

0

8

30

Tanmay Gupta

@tanmay2099

1 month

If you are a near-graduation PhD student in computer vision, consider applying to the ICCV 2025 Doctoral Consortium (DC). It is a chance to be mentored by an experienced researcher in the vision community to help you transition to your post-PhD career in academia or industry.

#ICCV2025

@ICCVConference

1 month

Finishing your PhD or just defended? Apply to the #ICCV2025 Doctoral Consortium. Get feedback and mentorship from leading researchers in computer vision.

0

6

33

Tanmay Gupta

@tanmay2099

3 months

This morning took a scenic walk from Ai2’s (@allen_ai) past to its future! Reminds me of this wonderful feeling of day 1 as an intern at a new and shiny office - no idea where anything is anymore! 🤩

0

22

Tanmay Gupta

@tanmay2099

5 months

Context for that 3rd point:

Tanmay Gupta

@tanmay2099

6 months

Love the evolution of this research thread: 2015 - Neural Module Networks (NMN) by @jacobandreas et al. was my introduction to neuro-symbolic reasoning in grad school. Super exciting approach but program synthesis and neural modules were both brittle back then. 2022 - GPT3 and

0

Tanmay Gupta

@tanmay2099

5 months

Some reflections on neuro-modular approaches like VisProg/ViperGPT (and their various agentic incarnations) in context of today’s models: 1. End-to-end (E2E) vs neuro-modular (NM) is a false dichotomy We don’t need to replace E2E models that work well with NM systems but we can

Saining Xie

@sainingxie

5 months

Some further thoughts on the idea of "thinking with images": 1) zero-shot tool use is limited -- you can’t just call an object detector to do visual search. That’s why approaches like VisProg/ViperGPT/Visual-sketchpad will not generalize or scale well. 2) visual search needs to

2

6

26

Tanmay Gupta

@tanmay2099

5 months

Great initiative by #CVPR2025! Kudos to Alyosha and Antonio for volunteering to run these practice sessions 👏👏

1

0

18

Luca Weihs

@LucaWeihs

6 months

Loved working with Zaid as he led this exciting project at Ai2! LLM-based coding agents are remarkably capable when given well-grounded plans but generating such plans from arbitrarily large code-bases is extremely challenging to do efficiently, the solution: MutaGReP

Zaid Khan

@codezakh

6 months

✨ Introducing MutaGReP (Mutation-guided Grounded Repository Plan Search) - an approach that uses LLM-guided tree search to find realizable plans that are grounded in a target codebase without executing any code! Ever wanted to provide an entire repo containing 100s of 1000s of

0

5

21

AK

@_akhaliq

6 months

MutaGReP Execution-Free Repository-Grounded Plan Search for Code-Use

2

11

44

Yue Yang

@YueYangAI

7 months

We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: https://t.co/9IQ4CgeKMF Dataset: https://t.co/yiERrZup8X Paper:

6

47

196

Tanmay Gupta

@tanmay2099

9 months

@anand_bhattad @SkyLi0n @jon_barron @anikembhavi (Thanks for the shoutout @anand_bhattad !) or CodeNav which generalizes tool-use to code-use! Some key improvements upon VisProg / ViperGPT style tool-use systems: ✅ Its way more flexible in how tools are provided (just build a python codebase and point the LLM to that

0

1

4

Jaemin Cho

@jmin__cho

9 months

🚨 I’m on the 2024-2025 academic job market! https://t.co/NKVe24PSOl I work on ✨ Multimodal AI ✨, with a special focus on enhancing reasoning in both understanding and generation tasks by: 1⃣Making it more scalable 2⃣Making it more faithful 3⃣Evaluating and refining multimodal

6

42

218

Jesse Thomason

@_jessethomason_

9 months

I missed this post back in JULY when Tanmay made it but it's prescient and even more relevant now. Ccore NLP folks, remember not to re-invent the wheel. Agents are a thing in robotics and reinforcement learning and planning. We have algorithms! Come chat with us!

Tanmay Gupta

@tanmay2099

1 year

Do we need to narrowly redefine "Agent" for LLM-Agents or can we just borrow a broader definition from RL / Embodied AI literature? LLM Agents are agentic in the same sense that a trained robot or an RL policy is agentic. Making this connection more explicit allows us to borrow

3

16

Unnat Jain

@unnatjain2010

10 months

Excited to share that I'll be joining University of California at Irvine as a CS faculty in '25!🌟 Faculty apps: @_krishna_murthy, @liuzhuang1234 & I share our tips: https://t.co/vynFYIJMQA PhD apps: I'm looking for students in vision, robot learning, & AI4Science. Details👇

38

72

392

Tanmay Gupta

@tanmay2099

10 months

I am hiring interns to join us @allen_ai in advancing the science and art of building agents of all kinds: 🕸️ Web-use 💻 Code-use 🛠️ Tool-use Join us in answering exciting questions about multimodal planning, agentic learning, dealing with underspecified queries and more!

Prior @ AI2

@Ai2Prior

10 months

📢Applications are open for summer'25 internships at the PRIOR (computer vision) team @allen_ai: Come join us in building large-scale models for: 📸 Open-source Vision-Language Models 💻 Multimodal Web Agents 🤖 Embodied AI + Robotics 🌎 Planet Monitoring Apply by December

0

7

46

Zichen "Charles" Zhang

@ZCCZHANG

10 months

We won the outstanding paper award @corl_conf !!! 😀😀😀And here’s what’s inside that mysterious big box

Kuo-Hao Zeng

@KuoHaoZeng

10 months

🚀 Quick Update 🚀 🎉 @ZCCZHANG will present PoliFormer at CoRL Oral Session 5 (🕤 9:30-10:30, Fri, Nov 8, CET)! 🎉 Meet us at Poster Session 4 (🕓 16:00-17:30) to chat with @ZCCZHANG, @rosemhendrix, and Jordi! 💻 Our code & checkpoints are NOW public:

1

4

22

Kuo-Hao Zeng

@KuoHaoZeng

10 months

Incredibly honored to share this amazing news! PoliFormer has won the Outstanding Paper Award at @corl_conf 2024! 🎉 Check out our project and code: https://t.co/XJRjZnTVWM

Ani Kembhavi

@anikembhavi

10 months

PoliFormer has won the Outstanding Paper Award at @corl_conf 2024! On policy RL with a modern transformer architecture can produce masterful navigators for multiple embodiments. All Sim-to-Real. A last hurrah from work at @allen_ai ! Led by @KuoHaoZeng @ZCCZHANG and @LucaWeihs

2

7

78

Tanmay Gupta

@tanmay2099

1 year

This is how we do POS tagging in 2024, right? Jokes aside, the model is actually really good at pointing. Check it out yourself!

Ai2

@allen_ai

1 year

Meet Molmo: a family of open, state-of-the-art multimodal AI models. Our best model outperforms proprietary systems, using 1000x less data. Molmo doesn't just understand multimodal data—it acts on it, enabling rich interactions in both the physical and virtual worlds. Try it

0

4

20

Tanmay Gupta

@tanmay2099

1 year

5. In summary: Besides the LLM itself, the design of the environment, observations, and the action space are 3 powerful but often under-explored degrees of freedom that can help you improve the performance, robustness, and implementation of your LLM-Agent. Check out CodeNav for

0

4

Tanmay Gupta

@tanmay2099

1 year

4. So how does this improve your implementation? With a clean division of responsibility among a few components/classes. The pseudo-code below shows the general structure:

1

0

2

Tanmay Gupta

@tanmay2099

1 year

3.3 What is your action space? 🤔 Your action space constrains how your agent interacts with the environment. LLM outputs that do not satisfy these constraints are invalid and must be rejected and resampled. Examples of action spaces: - web agents: clicking, scrolling, typing

1

0