Sayan Nag (সায়ন নাগ) @nagsayan112358 X Profile

Sayan Nag (সায়ন নাগ)

@nagsayan112358

Followers

101

Following

696

Media

6

Statuses

165

Research Scientist @adoberesearch | PhD @UofT Multimodal Understanding and Generation, Vision-Language, Audio-Visual, Deep Learning

https://t.co/nISJQzA9Fx

Joined January 2017

Don't wanna be here? Send us removal request.

Arman Zarei

@arman_zareii

2 months

🎉Excited to share that our paper "Localizing Knowledge in Diffusion Transformers" has been accepted to NeurIPS 2025! 🌐Project Page: https://t.co/4PRKXy3rJv 📄Paper: https://t.co/UY0oPDhRTp Joint work with @BasuSamyadeep, @RezaeiKeivan, Zihao Lin, @nagsayan112358, @FeiziSoheil

arxiv.org

Understanding how knowledge is distributed across the layers of generative models is crucial for improving interpretability, controllability, and adaptation. While prior work has explored...

Arman Zarei

@arman_zareii

6 months

🚀New Paper: Localizing Knowledge in Diffusion Transformers 🌐Project page: https://t.co/NXFJm9Twkp 📄Paper: https://t.co/UY0oPDhRTp Joint work with: @BasuSamyadeep, @RezaeiKeivan, Zihao Lin, @nagsayan112358, @FeiziSoheil

0

3

13

Keivan Rezaei

@RezaeiKeivan

2 months

🎉 Excited to share that our paper Localizing Knowledge in Diffusion Transformers has been accepted to #NeurIPS2025! In this work, we extend our previous study on localizing knowledge within T2I models to DiTs such as FLUX, PixArt, and SANA. paper: https://t.co/uouXmJWqx6

1

2

20

Arman Zarei

@arman_zareii

6 months

🚀New Paper: Localizing Knowledge in Diffusion Transformers 🌐Project page: https://t.co/NXFJm9Twkp 📄Paper: https://t.co/UY0oPDhRTp Joint work with: @BasuSamyadeep, @RezaeiKeivan, Zihao Lin, @nagsayan112358, @FeiziSoheil

3

6

31

Sanjoy Chowdhury ✈️ ICCV 2025 🌺

@schowdhury671

8 months

🚨 We propose an actor-critic based framework that distills structured, step-by-step reasoning into AVLLMs at test time, improving their ability to process complex multi-modal inputs. 📢 We present AVReasonBench, a challenging benchmark comprising 4500 AV questions.

arXiv Sound

@ArxivSound

8 months

``Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs,'' Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny, Salman Khan, Dinesh Manocha,

1

2

5

Sayan Nag (সায়ন নাগ)

@nagsayan112358

11 months

Note that: - You must be a PhD student at the time of internship. - Students with publication(s) at tier-1 conferences are preferred. - The location is in India.

0

Sayan Nag (সায়ন নাগ)

@nagsayan112358

11 months

🚀 Internship Opportunity at #AdobeResearch🚀 Looking for PhD interns for Summer 2025! Interested in exploring the intersection of multimodal LLMs, diffusion models, etc? 📩 Send me a DM with your CV, website, and GScholar profile. #GenerativeAI

1

5

Sanjoy Chowdhury ✈️ ICCV 2025 🌺

@schowdhury671

1 year

📢📢 We are thrilled to share that our paper Meerkat : Audio-Visual Large Language Model for Grounding in Space and Time got accepted to @eccvconf 2024! 🚀🎊 🧵 (1/n)

1

2

30

Koustava Goswami

@koustavagoswami

1 year

Our work "SAFARI" got accepted at ECCV 2024. We introduce a novel method for solving RES task..we introduce a novel of doing layer level fusion between text and images. Wonderful work done by @nagsayan112358 #ECCV2024 #multimodal Preprint coming soon

Sayan Nag (সায়ন নাগ)

@nagsayan112358

1 year

📢 Excited to share that SAFARI got accepted to ECCV 2024! 🎉 We develop a novel method for solving Referring Expression Segmentation (RES) task under low annotation settings. Big congrats to my coauthors @koustavagoswami, Srikrishna Karanam. #ECCV2024 #multimodal #visionlanguage

0

1

8

Sayan Nag (সায়ন নাগ)

@nagsayan112358

1 year

📢 Excited to share that SAFARI got accepted to ECCV 2024! 🎉 We develop a novel method for solving Referring Expression Segmentation (RES) task under low annotation settings. Big congrats to my coauthors @koustavagoswami, Srikrishna Karanam. #ECCV2024 #multimodal #visionlanguage

0

1

16

Sayan Nag (সায়ন নাগ)

@nagsayan112358

1 year

@schowdhury671 will be presenting MeLFusion and @Shramanpramani2 will be presenting VistaLLM

0

3

Sayan Nag (সায়ন নাগ)

@nagsayan112358

1 year

Unfortunately won't be attending #CVPR24 but please check out our papers 1. MeLFusion: https://t.co/9NKE4Ys3ch 2. VistaLLM: https://t.co/VNn1Hb1aYo Both of these are Poster Highlights! #CVPR24 #GenerativeAI

1

6

GAMMA UMD

@gammaumd

1 year

Also in session #6, @schowdhury671 presents "MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models" ( https://t.co/2I9j0ysfAG) with @nagsayan112358 @josephkj_in @balajivasan @dmanocha @AdobeResearch

0

4

9

Prof. Anima Anandkumar

@AnimaAnandkumar

2 years

For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge

AK

@_akhaliq

2 years

GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank

47

368

2K

Shraman Pramanick

@Shramanpramani2

1 year

📢Happy to share that EgoVLPv2 (ICCV 2023) is awarded as an Egocentric Vision 2022/2023 Distinguished Paper. Thanks to my coauthors @yalesong @PengchuanZ @MikeShou1 @nagsayan112358 @KevinQHLin arXiv: https://t.co/3OlhdwV0XR Code: https://t.co/VqjFhp4Stn… @CVPR @ICCVConference

github.com

Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023] - facebookresearch/EgoVLPv2

Dima Damen

@dimadamen

1 year

Congratulations to the authors of the 10 papers selected as the EgoVis 2022/2023 Distinguished Paper Awards... https://t.co/KGMmxwYkIX Join the awards ceremony @CVPR EgoVis Workshop 17/06 where authors will give short speeches about the story behind these seminal papers.

1

16

Shraman Pramanick

@Shramanpramani2

2 years

@WenhuChen Nice work! Reasoning over multiple images is indeed a less-explored field. We have reported results on NLVR and Co-Segmentation tasks using our proposed VistaLLM in CVPR 2024.

1

5

Joseph KJ

@josephkj_in

2 years

#CVPR2024 has said an ‘Accept’ to our work on generating music conditioned on multimodal signals from text and images! Joint work with our 2023 @AdobeResearch interns (@schowdhury671 and @nagsayan112358) and @balajivasan.

1

3

61

Sanjoy Chowdhury ✈️ ICCV 2025 🌺

@schowdhury671

2 years

📢 Thrilled to share that our paper MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models got accepted to CVPR 2024! In this work we present a novel diffusion powered approach to generate audio from multi-modal inputs (image and text).

2

26

Sayan Nag (সায়ন নাগ)

@nagsayan112358

2 years

@schowdhury671 APoLLo substantially improves the generalization capabilities of VLP models when fine-tuned in a few-shot setting. We introduce trainable cross-attention-based adapter layers in conjunction with vision and language encoders to strengthen the alignment between the two modalities.

0

Sayan Nag (সায়ন নাগ)

@nagsayan112358

2 years

We are happy to share that we recently presented our work APoLLo🚀: Unified Adapter and Prompt Learning for Vision Language Models at #EMNLP2023 (Main track). Huge thanks to @schowdhury671 and Dinesh Manocha. Paper:

1

2

8

Sayan Nag (সায়ন নাগ)

@nagsayan112358

2 years

The code and model checkpoints of VoLTA (TMLR, 2023) are now publicly available! 📜 Paper: https://t.co/pfCc0SVIDz 📷 Code: https://t.co/U2Il7DZqdd 📷Project: https://t.co/2h1HOBOYvP Huge thanks to @Shramanpramani2, Li Jing, @JiachenAI, Hardik Shah, @ylecun, and Rama Chellappa.

0

5