Agrim Gupta @agrimgupta92 X Profile

Agrim Gupta

@agrimgupta92

Followers

4K

Following

619

Media

37

Statuses

343

Simulating reality @GoogleDeepMind, prev PhD @Stanford

Joined January 2017

Don't wanna be here? Send us removal request.

Agrim Gupta

@agrimgupta92

2 years

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

51

249

1K

Agrim Gupta

@agrimgupta92

14 days

RT @SeanKirmani: 🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025!. We have an excellent group of speakers and….

0

35

0

Agrim Gupta

@agrimgupta92

1 month

RT @keshigeyan: 1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrain….

0

43

0

Agrim Gupta

@agrimgupta92

3 months

RT @RuiqiGao: 📢 We are organizing a workshop at ICML on Building Physically Plausible World Models. Join us by submitting your cool papers….

0

13

0

Agrim Gupta

@agrimgupta92

3 months

RT @SeanKirmani: 🌎🌏🌍 We are organizing a workshop on Building Physically Plausible World Models at @icmlconf 2025!. We have a great lineup….

0

22

0

Agrim Gupta

@agrimgupta92

3 months

RT @abhisk_kadian: Meet Llama4 Scout & Maverick: the next generation of Llama 🦙 🚀. Download: Blog: .

0

3

0

Agrim Gupta

@agrimgupta92

5 months

RT @SuryaGanguli: My @TEDAI2024 talk is out!.I discuss our work, spanning AI, physics, math & neuroscience, to deve….

0

46

0

Agrim Gupta

@agrimgupta92

6 months

RT @avdnoord: Our image model is on LMSYS : ). It's been an amazing effort by the team, I'm very proud of what we achieved over the last ye….

0

46

0

Agrim Gupta

@agrimgupta92

7 months

RT @SuryaGanguli: Nice name! We can share :).MetaMorph: Learning Universal Controllers with Transformers.

0

2

0

Agrim Gupta

@agrimgupta92

7 months

RT @RubenEVillegas: A cat roars while looking at its reflection in the mirror but instead sees itself as a lion roaring #veo2 https://t.co/….

0

209

0

Agrim Gupta

@agrimgupta92

7 months

🤯.

Weizhe Hua

@hua_weizhe

7 months

A cat jumps on a couch. #veo2

1

3

55

Agrim Gupta

@agrimgupta92

7 months

"A pair of hands skillfully slicing a ripe tomato on a wooden cutting board". #veo

151

218

3K

Agrim Gupta

@agrimgupta92

7 months

RT @GoogleDeepMind: Today, we’re announcing Veo 2: our state-of-the-art video generation model which produces realistic, high-quality clips….

0

1K

0

Agrim Gupta

@agrimgupta92

7 months

Today we are introducing Veo2: SOTA video generation model. You can try the model here:

Sundar Pichai

@sundarpichai

7 months

Introducing Veo 2, our new, state-of-the-art video model (with better understanding of real-world physics & movement, up to 4K resolution). You can join the waitlist on VideoFX. Our new and improved Imagen 3 model also achieves SOTA results, and is coming today to 100+ countries

3

129

Agrim Gupta

@agrimgupta92

7 months

RT @jparkerholder: Introducing 🧞Genie 2 🧞 - our most capable large-scale foundation world model, which can generate a diverse array of cons….

0

475

0

Agrim Gupta

@agrimgupta92

8 months

RT @drfeifei: Super proud of this work by my student @keshigeyan and our collaborators on pushing the benchmarking of LLMs for spatial and….

0

50

0

Agrim Gupta

@agrimgupta92

8 months

RT @ManlingLi_: HourVideo: New benchmark for hour-long egocentric videos!. Most challenging benchmark so far (SOTA 37.3% → Humans 85.0%). A….

0

7

0

Agrim Gupta

@agrimgupta92

8 months

New benchmark for long video understanding. LLMs have made significant strides in reasoning, handling long contexts, and solving complex quantitative tasks. But they still struggle with something humans find “easy”: processing visual information over extended time periods. Our.

Keshigeyan Chandrasegaran

@keshigeyan

8 months

1/ [NeurIPS D&B] Introducing HourVideo: A benchmark for hour-long video-language understanding!🚀. 500 egocentric videos, 18 total tasks & ~13k questions!. Performance:.GPT-4➡️25.7%.Gemini 1.5 Pro➡️37.3%.Humans➡️85.0%. We highlight a significant gap in multimodal capabilities🧵👇

1

7

31

Agrim Gupta

@agrimgupta92

9 months

RT @DrJimFan: Not every foundation model needs to be gigantic. We trained a 1.5M-parameter neural network to control the body of a humanoid….

0

508

0

Agrim Gupta

@agrimgupta92

10 months

RT @abhisk_kadian: Llama3.2 models are here 🎉! We are releasing the multimodal and lightweight Llama models.

0

5

0

Agrim Gupta

@agrimgupta92

10 months

RT @jcjohnss: Spatial Intelligence lets us perceive, reason, act, and create in the rich 3D world around us. Today we are launching World L….

0

26

0