Shraman Pramanick @Shramanpramani2 X Profile

Shraman Pramanick

@Shramanpramani2

Followers

489

Following

848

Media

8

Statuses

96

PostDoc @AIatMeta Ph.D. @JohnsHopkins | Interned @AIatMeta FAIR, GenAI, @google GDM | Multimodal LLMs

https://t.co/WoavQIl1yX

Baltimore, MD

Joined October 2016

Don't wanna be here? Send us removal request.

Kevin Lin

@KevinQHLin

15 days

I’m deeply saddened and frustrated to hear that my friend @Shramanpramani2 ( https://t.co/eoQLDVM1tJ) been affected by the recent layoffs at Meta — such a pity, especially for a fresh PhD with so much potential. I’ve had the pleasure of working with Shraman since 2023, a

Shraman Pramanick

@Shramanpramani2

19 days

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on

3

4

68

Shraman Pramanick

@Shramanpramani2

19 days

My role at Meta's SAM team (MSL, previously at FAIR Perception) has been impacted within 3 months of joining after PhD. If you work with multimodal LLMs for grounding or complex reasoning, or have a long-term vision of unified understanding and generation, let's talk. I am on

Jiaxun Cui 🐿️

@cuijiaxun

20 days

Meta has gone crazy on the squid game! Many new PhD NGs are deactivated today (I am also impacted🥲 happy to chat)

27

343

NeurIPS Conference

@NeurIPSConf

8 months

NeurIPS 2025 is soliciting self-nominations for reviewers and ACs. Please read our blog post for details on eligibility criteria, and process to self-nominate:

4

29

127

Sayan Nag (সায়ন নাগ)

@nagsayan112358

10 months

🚀 Internship Opportunity at #AdobeResearch🚀 Looking for PhD interns for Summer 2025! Interested in exploring the intersection of multimodal LLMs, diffusion models, etc? 📩 Send me a DM with your CV, website, and GScholar profile. #GenerativeAI

1

5

Zhuang Liu

@liuzhuang1234

11 months

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit

24

136

726

Shraman Pramanick

@Shramanpramani2

11 months

I am at NeurIPS 2024 at Vancouver. I'll be presenting SPIQA on Wednesday AM Poster Session Booth #3700! 📜 arXiv: https://t.co/UoqZ82ibvy 🗄️SPIQA dataset: https://t.co/JvIQibFDGI 👨‍💻 github: https://t.co/Ns4KqVXIAG In this work, we have done a comprehensive analysis of

1

0

8

Ishan Misra

@imisra_

1 year

So, this is what we were up to for a while :) Building SOTA foundation models for media -- text-to-video, video editing, personalized videos, video-to-audio One of the most exciting projects I got to tech lead at my time in Meta!

AI at Meta

@AIatMeta

1 year

🎥 Today we’re premiering Meta Movie Gen: the most advanced media foundation models to-date. Developed by AI research teams at Meta, Movie Gen delivers state-of-the-art results across a range of capabilities. We’re excited for the potential of this line of research to usher in

40

71

890

Irena Gao

@irena_gao

1 year

Applying to Stanford's CS PhD program? Current graduate students are running a SoP + CV feedback program for URM applicants (broadly defined). Apply to SASP by Oct. 25! Info:

cs.stanford.edu

3

96

451

Shraman Pramanick

@Shramanpramani2

1 year

SPIQA is accepted by #NeurIPS2024 D&B! 😃 In this work, we have done a comprehensive analysis of various strong multimodal LLMs for understanding wide range of scientific figures and tables, including schematic diagrams, charts, plots, visualizations etc. Check out our paper,

Shraman Pramanick

@Shramanpramani2

1 year

✨Can multimodal LLMs effectively answer questions in the context of long scientific research papers by thoroughly analyzing the entire text, complex figures, tables, and captions? Our recent project, SPIQA, initiates an exploration into this question by developing the first

0

10

Amir Bar

@_amirbar

1 year

great example on how to stick to your research agenda despite temporary distractions.

Zhuang Liu

@liuzhuang1234

1 year

Paper is rejected, but a followup paper that completely depends on the rejected paper is accepted #NeurIPS

0

2

10

Andrej Karpathy

@karpathy

1 year

Programming is changing so fast... I'm trying VS Code Cursor + Sonnet 3.5 instead of GitHub Copilot again and I think it's now a net win. Just empirically, over the last few days most of my "programming" is now writing English (prompting and then reviewing and editing the

526

2K

18K

AK

@_akhaliq

1 year

Salesforce presents xGen-MM (BLIP-3) A Family of Open Large Multimodal Models discuss: https://t.co/e056zqI1Oo This report introduces xGen-MM (also known as BLIP-3), a framework for developing Large Multimodal Models (LMMs). The framework comprises meticulously curated

7

75

307

Javi Lopez ⛩️

@javilopen

1 year

Generative AI 3 years ago VS today.

180

1K

14K

Jordan Burroughs

@alliseeisgold

1 year

GIVE VINESH SILVER! 🥈

3K

27K

101K

Shraman Pramanick

@Shramanpramani2

1 year

This work was done during my Student Researcher tenure at @Google. I can not thank enough my rock star host Subhashini Venugopalan and my Ph.D. advisor Professor Rama Chellappa.

0

2

Shraman Pramanick

@Shramanpramani2

1 year

Limitations and Future Prospects: SPIQA only consists of papers from Computer Science. Extending SPIQA to encompass other scientific domains remains a future prospect. 7/7 🧶

1

0

2

Shraman Pramanick

@Shramanpramani2

1 year

Well-written related works are often undervalued in the review process. In our paper, we provide an extensive comparison of SPIQA with all existing scientific question answering datasets. 6/7 🧶

1

0

1

Shraman Pramanick

@Shramanpramani2

1 year

Our proposed CoT evaluation prompt guides the models through step-by-step reasoning, which often results in better responses. For instance, GPT-4 Vision shows an increase of 6.70, 1.73, and 2.98 in L3Score when using CoT prompts compared to direct QA. Similar improvements are

1

0

1

Shraman Pramanick

@Shramanpramani2

1 year

We fine-tune InstructBLIP and LLaVA 1.5 and obtain a massive improvement of 28 and 26 point L3Score on average over three test sets of SPIQA compared to corresponding zero-shot models. These finetuned models perform almost equally well as Gemini Pro Vision, a powerful

1

0

1