Shraman Pramanick
@Shramanpramani2
Followers
489
Following
848
Media
8
Statuses
96
PostDoc @AIatMeta Ph.D. @JohnsHopkins | Interned @AIatMeta FAIR, GenAI, @google GDM | Multimodal LLMs
Baltimore, MD
Joined October 2016
I’m deeply saddened and frustrated to hear that my friend @Shramanpramani2 ( https://t.co/eoQLDVM1tJ) been affected by the recent layoffs at Meta — such a pity, especially for a fresh PhD with so much potential. I’ve had the pleasure of working with Shraman since 2023, a
My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on
3
4
68
My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on
Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)
27
27
343
NeurIPS 2025 is soliciting self-nominations for reviewers and ACs. Please read our blog post for details on eligibility criteria, and process to self-nominate:
4
29
127
🚀 Internship Opportunity at #AdobeResearch🚀 Looking for PhD interns for Summer 2025! Interested in exploring the intersection of multimodal LLMs, diffusion models, etc? 📩 Send me a DM with your CV, website, and GScholar profile. #GenerativeAI
1
1
5
How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit
24
136
726
I am at NeurIPS 2024 at Vancouver. I'll be presenting SPIQA on Wednesday AM Poster Session Booth #3700! 📜 arXiv: https://t.co/UoqZ82ibvy 🗄️SPIQA dataset: https://t.co/JvIQibFDGI 👨💻 github: https://t.co/Ns4KqVXIAG In this work, we have done a comprehensive analysis of
1
0
8
So, this is what we were up to for a while :) Building SOTA foundation models for media -- text-to-video, video editing, personalized videos, video-to-audio One of the most exciting projects I got to tech lead at my time in Meta!
🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in
40
71
890
Applying to Stanford's CS PhD program? Current graduate students are running a SoP + CV feedback program for URM applicants (broadly defined). Apply to SASP by Oct. 25! Info:
cs.stanford.edu
3
96
451
SPIQA is accepted by #NeurIPS2024 D&B! 😃 In this work, we have done a comprehensive analysis of various strong multimodal LLMs for understanding wide range of scientific figures and tables, including schematic diagrams, charts, plots, visualizations etc. Check out our paper,
✨Can multimodal LLMs effectively answer questions in the context of long scientific research papers by thoroughly analyzing the entire text, complex figures, tables, and captions? Our recent project, SPIQA, initiates an exploration into this question by developing the first
0
0
10
great example on how to stick to your research agenda despite temporary distractions.
Paper is rejected, but a followup paper that completely depends on the rejected paper is accepted #NeurIPS
0
2
10
Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the
526
2K
18K
Salesforce presents xGen-MM (BLIP-3) A Family of Open Large Multimodal Models discuss: https://t.co/e056zqI1Oo This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated
7
75
307
This work was done during my Student Researcher tenure at @Google. I can not thank enough my rock star host Subhashini Venugopalan and my Ph.D. advisor Professor Rama Chellappa.
0
0
2
Limitations and Future Prospects: SPIQA only consists of papers from Computer Science. Extending SPIQA to encompass other scientific domains remains a future prospect. 7/7 🧶
1
0
2
Well-written related works are often undervalued in the review process. In our paper, we provide an extensive comparison of SPIQA with all existing scientific question answering datasets. 6/7 🧶
1
0
1
Our proposed CoT evaluation prompt guides the models through step-by-step reasoning, which often results in better responses. For instance, GPT-4 Vision shows an increase of 6.70, 1.73, and 2.98 in L3Score when using CoT prompts compared to direct QA. Similar improvements are
1
0
1
We fine-tune InstructBLIP and LLaVA 1.5 and obtain a massive improvement of 28 and 26 point L3Score on average over three test sets of SPIQA compared to corresponding zero-shot models. These finetuned models perform almost equally well as Gemini Pro Vision, a powerful
1
0
1