Sanjay Subramanian @sanjayssub X Profile

Sanjay Subramanian

@sanjayssub

Followers

895

Following

2K

Media

13

Statuses

259

Building/analyzing NLP and vision models. PhD student @berkeley_ai. Formerly: @allen_ai, @penn

Berkeley, CA

Joined September 2019

Don't wanna be here? Send us removal request.

Sanjay Subramanian

@sanjayssub

2 years

New paper at #acl2023nlp!."Modular Visual Question Answering via Code Generation".With @medhini_n @kushaltk1248 @KevinYa33964384 @NagraniArsha @CordeliaSchmid @andyzengtweets @trevordarrell Dan Klein (@berkeley_ai/@GoogleAI)!.📜 💻

5

44

152

Sanjay Subramanian

@sanjayssub

1 month

RT @LakshyAAAgrawal: How does prompt optimization compare to RL algos like GRPO?. GRPO needs 1000s of rollouts, but humans can learn from a….

0

166

0

Grok

@grok

25 days

Blazing-fast image creation – using just your voice. Try Grok Imagine.

331

661

4K

Sanjay Subramanian

@sanjayssub

2 months

RT @brenthyi: Had so much fun working on this😊. PyTorch and JAX implementations are both out!.

0

8

0

Sanjay Subramanian

@sanjayssub

2 months

RT @ruilong_li: For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc]. Stop settling for Plü….

0

81

0

Sanjay Subramanian

@sanjayssub

2 months

RT @baifeng_shi: Understanding a video involves both short-range and long-range understanding. Short-range understanding is more about "mo….

0

12

0

Sanjay Subramanian

@sanjayssub

2 months

RT @realJessyLin: User simulators bridge RL with real-world interaction //. How do we get the RL paradigm to work….

0

46

0

Sanjay Subramanian

@sanjayssub

2 months

RT @YutongBAI1002: What would a World Model look like if we start from a real embodied agent acting in the real world?. It has to have: 1)….

0

131

0

Sanjay Subramanian

@sanjayssub

3 months

This repo is based heavily on big_vision ❤️, and the main additions so far are support for more sharding types, ring/flash attention, and a different architecture (LLaVA OneVision/Video).

1

0

2

Sanjay Subramanian

@sanjayssub

3 months

Finally, some collaborators and I have been working on a repo for running inference and fine-tuning on video LMs in JAX, and I hope it can be useful to many others: Hope to improve it over time, please let me know if you have issues or want other features!.

github.com

Run Inference/Finetuning on large Video LMs in JAX - sanjayss34/big_video_lm

2

0

1

Sanjay Subramanian

@sanjayssub

3 months

Also be sure to check out this awesome work on automated slide generation led by @aomaru_21490 and @ZhiruoW on Friday at Poster Session 1 - ExHall D #262.

Jiaxin Ge

@aomaru_21490

8 months

Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch!.📄 🤗 🔗 .@berkeley_ai @LTIatCMU

1

3

Sanjay Subramanian

@sanjayssub

3 months

Excited to be at CVPR! Check out our work on using VLMs for pose estimation on Friday at Poster Session 2 - ExHall D #169. #CVPR2025.

Sanjay Subramanian

@sanjayssub

1 year

Excited to share some recent work!. "Pose Priors from Language Models". We show how to use multimodal LMs to improve 3D human pose estimates in situations with physical contact. Joint work w/ Evonne Ng , @LeaMue27 , Dan Klein (@BerkeleyNLP), @shiryginosar , @trevordarrell

2

0

12

Sanjay Subramanian

@sanjayssub

3 months

RT @Ritwik_G: Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always op….

0

33

0

Sanjay Subramanian

@sanjayssub

4 months

RT @ZhongRuiqi: Last day of PhD! . I pioneered using LLMs to explain dataset&model. It's used by interp at @OpenAI and societal impact @An….

0

38

0

Sanjay Subramanian

@sanjayssub

4 months

RT @NickATomlin: The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hop….

0

30

0

Sanjay Subramanian

@sanjayssub

5 months

RT @jiayi_pirate: We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning. APR lets LMs learn to orchestrate….

0

73

0

Sanjay Subramanian

@sanjayssub

5 months

RT @KushtimusPrime: NeRFs and Gaussian Splats excel at static 3D modeling but robots work in dynamic, unpredictable environments. POGS (Per….

0

17

0

Sanjay Subramanian

@sanjayssub

5 months

RT @baifeng_shi: Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today….

0

153

0

Sanjay Subramanian

@sanjayssub

6 months

RT @ZinengTang: We are thrilled to announce TULIP!. 🌷 A state of the vision language encoders coupled with generat….

0

69

0

Sanjay Subramanian

@sanjayssub

6 months

RT @enfleisig: How does model calibration stand up against humans? We ran live competitions, comparing model and human calibration, to crea….

0

3

0

Sanjay Subramanian

@sanjayssub

8 months

RT @aomaru_21490: Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, hig….

0

69

0

Sanjay Subramanian

@sanjayssub

9 months

RT @LeaMue27: - Humans and Structure from Motion -. We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured w….

0

65

0