Tanveer Hannan (on job market) @hannan_tanveer X Profile

Tanveer Hannan (on job market)

@hannan_tanveer

Followers

43

Following

48

Media

20

Statuses

140

𝐎𝐧 𝐭𝐡𝐞 𝐈𝐧𝐝𝐮𝐬𝐭𝐫𝐲 𝐉𝐨𝐛 𝐌𝐚𝐫𝐤𝐞𝐭 | Research Intern @Microsoft | Phd Student @LMU. Computer Vision, Video Understanding, Multimodal, AI Agent

https://t.co/Bq4QEBXVxc

Munich, Germany

Joined December 2013

Don't wanna be here? Send us removal request.

Tanveer Hannan (on job market)

@hannan_tanveer

1 day

Our latest paper, DocSLM, developed during my internship at Microsoft, is now on arXiv: https://t.co/P4m7o05SwZ. It is an efficient & compact Vision-Language Model to process long & complex documents while operating on resource-constrained edge devices like mobiles & laptops.

arxiv.org

Large Vision-Language Models (LVLMs) have demonstrated strong multimodal reasoning capabilities on long and complex documents. However, their high memory footprint makes them impractical for...

2

4

Tanveer Hannan (on job market)

@hannan_tanveer

1 day

@dimi_Mall @thomasSeidl @gberta227 @MohsenFyz @sunandosengupta #VisionLanguage #DocumentUnderstanding #MultimodalAI #EdgeAI #LongContext

0

1

Tanveer Hannan (on job market)

@hannan_tanveer

1 day

Thanks to my co-authors — @dimi_Mall , Parth Pathak, Faegheh (Fay) Sardari, @thomasseidl, @gberta227, @MohsenFyz , and @sunandosengupta — for their contributions throughout this project.

1

0

1

Tanveer Hannan (on job market)

@hannan_tanveer

1 day

• A scalable stream processor that enables reliable handling of long document sequences (up to 120 pages). • DocSLM requires 75% fewer parameters, 71% lower latency, maintains a constant memory of 14 GB across varying lengths, while delivering competitive or SOTA performance.

0

1

Tanveer Hannan (on job market)

@hannan_tanveer

1 day

Key Contributions • A hierarchical compression module that integrates OCR, visual, and layout features into a fixed-length representation, achieving an 82% reduction in visual tokens while preserving essential semantic structure.

1

0

1

Tanveer Hannan (on job market)

@hannan_tanveer

3 months

🚀 Checkout the Spatiotemporal Action Grounding Challenge now featured on the MCML blog! https://t.co/UXbN00vFTQ

mcml.ai

ICCV 2025 workshop: Advancing AI to detect who does what, when, and where — across space, time, and complex real-world videos.

0

Tanveer Hannan (on job market)

@hannan_tanveer

4 months

Checkout our new Challenge/ workshop at @ICCVConference

Mark

@_M_Weber

4 months

Exciting news! We're happy to announce our challenge / workshop at this year @ICCVConference focusing on Spatiotemporal Action Grounding in Videos. Here are the details: 🔷 Watch the video below for a demo. 🔷 The eval server is open until 09/19! 🔷 Links incl. code below. #ICCV

0

Tanveer Hannan (on job market)

@hannan_tanveer

4 months

@lealtaixe @_M_Weber @rajatkoner @Jindong73504766 @AljosaOsep

0

Tanveer Hannan (on job market)

@hannan_tanveer

4 months

We invite the research community to participate, submit their methods, and contribute to shaping the future of spatiotemporal understanding in computer vision. Outstanding submissions will be featured at the ICCV 2025 Workshop.

1

0

Tanveer Hannan (on job market)

@hannan_tanveer

4 months

The benchmark introduces new tasks, datasets, and evaluation protocols to encourage the development of User Instruction based more robust, scalable, and generalizable models for complex, real-world scenarios.

1

0

Tanveer Hannan (on job market)

@hannan_tanveer

4 months

This year’s challenge is centered on advancing research in: 🔹 Multi-Object Tracking 🔹 Instruction based Spatiotemporal Detection 🔹 Long-Term Temporal Reasoning

1

0

Tanveer Hannan (on job market)

@hannan_tanveer

4 months

🎯 Challenge Launch Announcement We are pleased to announce the launch of the MOT25 Challenge, to be held in conjunction with ICCV 2025. 🔗 Workshop website: https://t.co/YGg9wphKnT 🧪 The MOT25 Challenge is now live on Codabench:

1

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

🚀 On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta 🔍 Seeking Research Scientist/Engineer roles! 🔗 https://t.co/z9ioZPFCi9 📧 mmiemon [at] cs [dot] unc [dot] edu

md-mohaiminul.github.io

A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.

0

4

18

Tanveer Hannan (on job market)

@hannan_tanveer

5 months

🚀 Check out our latest work, ReVisionLLM, now featured on the MCML blog! 🔍 A Vision-Language Model for accurate temporal grounding in hour-long videos. 👉 https://t.co/cTNNcRLsFE #VisionLanguage #MultimodalAI #MCML #CVPR2025

mcml.ai

Tanveer Hannan and colleagues introduce ReVisionLLM, an AI model that mimics human skimming to accurately find key moments in long videos.

0

2

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout https://t.co/KAF47QI7yp @hannan_tanveer

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227

0

2

10

Tanveer Hannan (on job market)

@hannan_tanveer

5 months

Excited to have our paper ReVisionLLM presented today at #CVPR2025! Website:

lnkd.in

This link will take you to a page that’s not on LinkedIn

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227

0

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Had a great time presenting at the GenAI session @CiscoMeraki—thanks @nahidalam for the invite🙏 Catch us at #CVPR2025: 📌 BIMBA: https://t.co/4XCHPFWchy (June 15, 4–6PM, Poster #282) 📌 ReVisionLLM: https://t.co/KAF47QI7yp (June 14, 5–7PM, Poster #307) @gberta227 @hannan_tanveer

arxiv.org

Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal...

0

3

4

Laura Leal-Taixe

@lealtaixe

6 months

The time for new architectures is over? Not quite! SeNaTra, a native segmentation backbone, is waiting, let's see how it works 🧵 https://t.co/2I9nuLBsSz

arxiv.org

Uniform downsampling remains the de facto standard for reducing spatial resolution in vision backbones. In this work, we propose an alternative design built around a content-aware spatial grouping...

3

41

206

Tanveer Hannan (on job market)

@hannan_tanveer

8 months

Effective long-context comprehension remains a significant hurdle for LLMs. Meta's forthcoming Llama 4 aims to address this by iRoPE architecture. I am looking forward to testing them on more real life setups like streaming videos.

AI at Meta

@AIatMeta

8 months

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

0

1

Tanveer Hannan (on job market)

@hannan_tanveer

8 months

Check out the #CVPR2025 paper on long video understanding. It achieves SOTA with a much simpler and efficient end-to-end approach.

Mohaiminul (Emon) Islam (on job market)

@mmiemon

8 months

🚀New #CVPR2025 Paper🚀 Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡 It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. 🧵Thread below👇 https://t.co/yP9ZLkUX2N

0

1

2