Sanjay Subramanian Profile
Sanjay Subramanian

@sanjayssub

Followers
895
Following
2K
Media
13
Statuses
259

Building/analyzing NLP and vision models. PhD student @berkeley_ai. Formerly: @allen_ai, @penn

Berkeley, CA
Joined September 2019
Don't wanna be here? Send us removal request.
@sanjayssub
Sanjay Subramanian
2 years
New paper at #acl2023nlp!."Modular Visual Question Answering via Code Generation".With @medhini_n @kushaltk1248 @KevinYa33964384 @NagraniArsha @CordeliaSchmid @andyzengtweets @trevordarrell Dan Klein (@berkeley_ai/@GoogleAI)!.📜 💻
5
44
152
@sanjayssub
Sanjay Subramanian
1 month
RT @LakshyAAAgrawal: How does prompt optimization compare to RL algos like GRPO?. GRPO needs 1000s of rollouts, but humans can learn from a….
0
166
0
@grok
Grok
25 days
Blazing-fast image creation – using just your voice. Try Grok Imagine.
331
661
4K
@sanjayssub
Sanjay Subramanian
2 months
RT @brenthyi: Had so much fun working on this😊. PyTorch and JAX implementations are both out!.
0
8
0
@sanjayssub
Sanjay Subramanian
2 months
RT @ruilong_li: For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc]. Stop settling for Plü….
0
81
0
@sanjayssub
Sanjay Subramanian
2 months
RT @baifeng_shi: Understanding a video involves both short-range and long-range understanding. Short-range understanding is more about "mo….
0
12
0
@sanjayssub
Sanjay Subramanian
2 months
RT @realJessyLin: User simulators bridge RL with real-world interaction //. How do we get the RL paradigm to work….
0
46
0
@sanjayssub
Sanjay Subramanian
2 months
RT @YutongBAI1002: What would a World Model look like if we start from a real embodied agent acting in the real world?. It has to have: 1)….
0
131
0
@sanjayssub
Sanjay Subramanian
3 months
This repo is based heavily on big_vision ❤️, and the main additions so far are support for more sharding types, ring/flash attention, and a different architecture (LLaVA OneVision/Video).
1
0
2
@sanjayssub
Sanjay Subramanian
3 months
Finally, some collaborators and I have been working on a repo for running inference and fine-tuning on video LMs in JAX, and I hope it can be useful to many others: Hope to improve it over time, please let me know if you have issues or want other features!.
Tweet card summary image
github.com
Run Inference/Finetuning on large Video LMs in JAX - sanjayss34/big_video_lm
2
0
1
@sanjayssub
Sanjay Subramanian
3 months
Also be sure to check out this awesome work on automated slide generation led by @aomaru_21490 and @ZhiruoW on Friday at Poster Session 1 - ExHall D #262.
@aomaru_21490
Jiaxin Ge
8 months
Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, high-quality presentation slides from scratch!.📄 🤗 🔗 .@berkeley_ai @LTIatCMU
1
1
3
@sanjayssub
Sanjay Subramanian
3 months
Excited to be at CVPR! Check out our work on using VLMs for pose estimation on Friday at Poster Session 2 - ExHall D #169. #CVPR2025.
@sanjayssub
Sanjay Subramanian
1 year
Excited to share some recent work!. "Pose Priors from Language Models". We show how to use multimodal LMs to improve 3D human pose estimates in situations with physical contact. Joint work w/ Evonne Ng , @LeaMue27 , Dan Klein (@BerkeleyNLP), @shiryginosar , @trevordarrell
2
0
12
@sanjayssub
Sanjay Subramanian
3 months
RT @Ritwik_G: Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always op….
0
33
0
@sanjayssub
Sanjay Subramanian
4 months
RT @ZhongRuiqi: Last day of PhD! . I pioneered using LLMs to explain dataset&model. It's used by interp at @OpenAI and societal impact @An….
0
38
0
@sanjayssub
Sanjay Subramanian
4 months
RT @NickATomlin: The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hop….
0
30
0
@sanjayssub
Sanjay Subramanian
5 months
RT @jiayi_pirate: We explore a new dimension in scaling reasoning models in Adaptive Parallel Reasoning. APR lets LMs learn to orchestrate….
0
73
0
@sanjayssub
Sanjay Subramanian
5 months
RT @KushtimusPrime: NeRFs and Gaussian Splats excel at static 3D modeling but robots work in dynamic, unpredictable environments. POGS (Per….
0
17
0
@sanjayssub
Sanjay Subramanian
5 months
RT @baifeng_shi: Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today….
0
153
0
@sanjayssub
Sanjay Subramanian
6 months
RT @ZinengTang: We are thrilled to announce TULIP!. 🌷 A state of the vision language encoders coupled with generat….
0
69
0
@sanjayssub
Sanjay Subramanian
6 months
RT @enfleisig: How does model calibration stand up against humans? We ran live competitions, comparing model and human calibration, to crea….
0
3
0
@sanjayssub
Sanjay Subramanian
8 months
RT @aomaru_21490: Introducing "AutoPresent: Designing Structured Visuals From Scratch". We employ code generation to create structured, hig….
0
69
0
@sanjayssub
Sanjay Subramanian
9 months
RT @LeaMue27: - Humans and Structure from Motion -. We jointly reconstruct 3D humans, scene point cloud, and cameras from images captured w….
0
65
0