Sayan Nag (সায়ন নাগ) Profile
Sayan Nag (সায়ন নাগ)

@nagsayan112358

Followers
101
Following
696
Media
6
Statuses
165

Research Scientist @adoberesearch | PhD @UofT Multimodal Understanding and Generation, Vision-Language, Audio-Visual, Deep Learning

Joined January 2017
Don't wanna be here? Send us removal request.
@arman_zareii
Arman Zarei
2 months
🎉Excited to share that our paper "Localizing Knowledge in Diffusion Transformers" has been accepted to NeurIPS 2025! 🌐Project Page: https://t.co/4PRKXy3rJv 📄Paper: https://t.co/UY0oPDhRTp Joint work with @BasuSamyadeep, @RezaeiKeivan, Zihao Lin, @nagsayan112358, @FeiziSoheil
Tweet card summary image
arxiv.org
Understanding how knowledge is distributed across the layers of generative models is crucial for improving interpretability, controllability, and adaptation. While prior work has explored...
@arman_zareii
Arman Zarei
6 months
🚀New Paper: Localizing Knowledge in Diffusion Transformers 🌐Project page: https://t.co/NXFJm9Twkp 📄Paper: https://t.co/UY0oPDhRTp Joint work with: @BasuSamyadeep, @RezaeiKeivan, Zihao Lin, @nagsayan112358, @FeiziSoheil
0
3
13
@RezaeiKeivan
Keivan Rezaei
2 months
🎉 Excited to share that our paper Localizing Knowledge in Diffusion Transformers has been accepted to #NeurIPS2025! In this work, we extend our previous study on localizing knowledge within T2I models to DiTs such as FLUX, PixArt, and SANA. paper: https://t.co/uouXmJWqx6
1
2
20
@arman_zareii
Arman Zarei
6 months
🚀New Paper: Localizing Knowledge in Diffusion Transformers 🌐Project page: https://t.co/NXFJm9Twkp 📄Paper: https://t.co/UY0oPDhRTp Joint work with: @BasuSamyadeep, @RezaeiKeivan, Zihao Lin, @nagsayan112358, @FeiziSoheil
3
6
31
@schowdhury671
Sanjoy Chowdhury ✈️ ICCV 2025 🌺
8 months
🚨 We propose an actor-critic based framework that distills structured, step-by-step reasoning into AVLLMs at test time, improving their ability to process complex multi-modal inputs. 📢 We present AVReasonBench, a challenging benchmark comprising 4500 AV questions.
@ArxivSound
arXiv Sound
8 months
``Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMs,'' Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny, Salman Khan, Dinesh Manocha,
1
2
5
@nagsayan112358
Sayan Nag (সায়ন নাগ)
11 months
Note that: - You must be a PhD student at the time of internship. - Students with publication(s) at tier-1 conferences are preferred. - The location is in India.
0
0
0
@nagsayan112358
Sayan Nag (সায়ন নাগ)
11 months
🚀 Internship Opportunity at #AdobeResearch🚀 Looking for PhD interns for Summer 2025! Interested in exploring the intersection of multimodal LLMs, diffusion models, etc? 📩 Send me a DM with your CV, website, and GScholar profile. #GenerativeAI
1
1
5
@schowdhury671
Sanjoy Chowdhury ✈️ ICCV 2025 🌺
1 year
📢📢 We are thrilled to share that our paper Meerkat : Audio-Visual Large Language Model for Grounding in Space and Time got accepted to @eccvconf 2024! 🚀🎊 🧵 (1/n)
1
2
30
@koustavagoswami
Koustava Goswami
1 year
Our work "SAFARI" got accepted at ECCV 2024. We introduce a novel method for solving RES task..we introduce a novel of doing layer level fusion between text and images. Wonderful work done by @nagsayan112358 #ECCV2024 #multimodal Preprint coming soon
@nagsayan112358
Sayan Nag (সায়ন নাগ)
1 year
📢 Excited to share that SAFARI got accepted to ECCV 2024! 🎉 We develop a novel method for solving Referring Expression Segmentation (RES) task under low annotation settings. Big congrats to my coauthors @koustavagoswami, Srikrishna Karanam. #ECCV2024 #multimodal #visionlanguage
0
1
8
@nagsayan112358
Sayan Nag (সায়ন নাগ)
1 year
📢 Excited to share that SAFARI got accepted to ECCV 2024! 🎉 We develop a novel method for solving Referring Expression Segmentation (RES) task under low annotation settings. Big congrats to my coauthors @koustavagoswami, Srikrishna Karanam. #ECCV2024 #multimodal #visionlanguage
0
1
16
@nagsayan112358
Sayan Nag (সায়ন নাগ)
1 year
@schowdhury671 will be presenting MeLFusion and @Shramanpramani2 will be presenting VistaLLM
0
0
3
@nagsayan112358
Sayan Nag (সায়ন নাগ)
1 year
Unfortunately won't be attending #CVPR24 but please check out our papers 1. MeLFusion: https://t.co/9NKE4Ys3ch 2. VistaLLM: https://t.co/VNn1Hb1aYo Both of these are Poster Highlights! #CVPR24 #GenerativeAI
1
1
6
@gammaumd
GAMMA UMD
1 year
Also in session #6, @schowdhury671 presents "MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models" ( https://t.co/2I9j0ysfAG) with @nagsayan112358 @josephkj_in @balajivasan @dmanocha @AdobeResearch
0
4
9
@AnimaAnandkumar
Prof. Anima Anandkumar
2 years
For the first time, we show that the Llama 7B LLM can be trained on a single consumer-grade GPU (RTX 4090) with only 24GB memory. This represents more than 82.5% reduction in memory for storing optimizer states during training. Training LLMs from scratch currently requires huge
@_akhaliq
AK
2 years
GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank
47
368
2K
@Shramanpramani2
Shraman Pramanick
1 year
📢Happy to share that EgoVLPv2 (ICCV 2023) is awarded as an Egocentric Vision 2022/2023 Distinguished Paper. Thanks to my coauthors @yalesong @PengchuanZ @MikeShou1 @nagsayan112358 @KevinQHLin arXiv: https://t.co/3OlhdwV0XR Code: https://t.co/VqjFhp4Stn… @CVPR @ICCVConference
Tweet card summary image
github.com
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023] - facebookresearch/EgoVLPv2
@dimadamen
Dima Damen
1 year
Congratulations to the authors of the 10 papers selected as the EgoVis 2022/2023 Distinguished Paper Awards... https://t.co/KGMmxwYkIX Join the awards ceremony @CVPR EgoVis Workshop 17/06 where authors will give short speeches about the story behind these seminal papers.
1
1
16
@Shramanpramani2
Shraman Pramanick
2 years
@WenhuChen Nice work! Reasoning over multiple images is indeed a less-explored field. We have reported results on NLVR and Co-Segmentation tasks using our proposed VistaLLM in CVPR 2024.
1
1
5
@josephkj_in
Joseph KJ
2 years
#CVPR2024 has said an ‘Accept’ to our work on generating music conditioned on multimodal signals from text and images! Joint work with our 2023 @AdobeResearch interns (@schowdhury671 and @nagsayan112358) and @balajivasan.
1
3
61
@schowdhury671
Sanjoy Chowdhury ✈️ ICCV 2025 🌺
2 years
📢 Thrilled to share that our paper MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models got accepted to CVPR 2024! In this work we present a novel diffusion powered approach to generate audio from multi-modal inputs (image and text).
2
2
26
@nagsayan112358
Sayan Nag (সায়ন নাগ)
2 years
@schowdhury671 APoLLo substantially improves the generalization capabilities of VLP models when fine-tuned in a few-shot setting. We introduce trainable cross-attention-based adapter layers in conjunction with vision and language encoders to strengthen the alignment between the two modalities.
0
0
0
@nagsayan112358
Sayan Nag (সায়ন নাগ)
2 years
We are happy to share that we recently presented our work APoLLo🚀: Unified Adapter and Prompt Learning for Vision Language Models at #EMNLP2023 (Main track). Huge thanks to @schowdhury671 and Dinesh Manocha. Paper:
1
2
8
@nagsayan112358
Sayan Nag (সায়ন নাগ)
2 years
The code and model checkpoints of VoLTA (TMLR, 2023) are now publicly available! 📜 Paper: https://t.co/pfCc0SVIDz 📷 Code: https://t.co/U2Il7DZqdd 📷Project: https://t.co/2h1HOBOYvP Huge thanks to @Shramanpramani2, Li Jing, @JiachenAI, Hardik Shah, @ylecun, and Rama Chellappa.
0
0
5