hannan_tanveer Profile Banner
Tanveer Hannan (on job market) Profile
Tanveer Hannan (on job market)

@hannan_tanveer

Followers
43
Following
48
Media
20
Statuses
140

๐Ž๐ง ๐ญ๐ก๐ž ๐ˆ๐ง๐๐ฎ๐ฌ๐ญ๐ซ๐ฒ ๐‰๐จ๐› ๐Œ๐š๐ซ๐ค๐ž๐ญ | Research Intern @Microsoft | Phd Student @LMU. Computer Vision, Video Understanding, Multimodal, AI Agent

Munich, Germany
Joined December 2013
Don't wanna be here? Send us removal request.
@hannan_tanveer
Tanveer Hannan (on job market)
1 day
Our latest paper, DocSLM, developed during my internship at Microsoft, is now on arXiv: https://t.co/P4m7o05SwZ. It is an efficient & compact Vision-Language Model to process long & complex documents while operating on resource-constrained edge devices like mobiles & laptops.
Tweet card summary image
arxiv.org
Large Vision-Language Models (LVLMs) have demonstrated strong multimodal reasoning capabilities on long and complex documents. However, their high memory footprint makes them impractical for...
2
2
4
@hannan_tanveer
Tanveer Hannan (on job market)
1 day
Thanks to my co-authors โ€” @dimi_Mall , Parth Pathak, Faegheh (Fay) Sardari, @thomasseidl, @gberta227, @MohsenFyz , and @sunandosengupta โ€” for their contributions throughout this project.
1
0
1
@hannan_tanveer
Tanveer Hannan (on job market)
1 day
โ€ข A scalable stream processor that enables reliable handling of long document sequences (up to 120 pages). โ€ข DocSLM requires 75% fewer parameters, 71% lower latency, maintains a constant memory of 14 GB across varying lengths, while delivering competitive or SOTA performance.
0
0
1
@hannan_tanveer
Tanveer Hannan (on job market)
1 day
Key Contributions โ€ข A hierarchical compression module that integrates OCR, visual, and layout features into a fixed-length representation, achieving an 82% reduction in visual tokens while preserving essential semantic structure.
1
0
1
@hannan_tanveer
Tanveer Hannan (on job market)
3 months
๐Ÿš€ Checkout the Spatiotemporal Action Grounding Challenge now featured on the MCML blog! https://t.co/UXbN00vFTQ
Tweet card summary image
mcml.ai
ICCV 2025 workshop: Advancing AI to detect who does what, when, and where โ€” across space, time, and complex real-world videos.
0
0
0
@hannan_tanveer
Tanveer Hannan (on job market)
4 months
Checkout our new Challenge/ workshop at @ICCVConference
@_M_Weber
Mark
4 months
Exciting news! We're happy to announce our challenge / workshop at this year @ICCVConference focusing on Spatiotemporal Action Grounding in Videos. Here are the details: ๐Ÿ”ท Watch the video below for a demo. ๐Ÿ”ท The eval server is open until 09/19! ๐Ÿ”ท Links incl. code below. #ICCV
0
0
0
@hannan_tanveer
Tanveer Hannan (on job market)
4 months
0
0
0
@hannan_tanveer
Tanveer Hannan (on job market)
4 months
We invite the research community to participate, submit their methods, and contribute to shaping the future of spatiotemporal understanding in computer vision. Outstanding submissions will be featured at the ICCV 2025 Workshop.
1
0
0
@hannan_tanveer
Tanveer Hannan (on job market)
4 months
The benchmark introduces new tasks, datasets, and evaluation protocols to encourage the development of User Instruction based more robust, scalable, and generalizable models for complex, real-world scenarios.
1
0
0
@hannan_tanveer
Tanveer Hannan (on job market)
4 months
This yearโ€™s challenge is centered on advancing research in: ๐Ÿ”น Multi-Object Tracking ๐Ÿ”น Instruction based Spatiotemporal Detection ๐Ÿ”น Long-Term Temporal Reasoning
1
0
0
@hannan_tanveer
Tanveer Hannan (on job market)
4 months
๐ŸŽฏ Challenge Launch Announcement We are pleased to announce the launch of the MOT25 Challenge, to be held in conjunction with ICCV 2025. ๐Ÿ”— Workshop website: https://t.co/YGg9wphKnT ๐Ÿงช The MOT25 Challenge is now live on Codabench:
1
1
1
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
๐Ÿš€ On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta ๐Ÿ” Seeking Research Scientist/Engineer roles! ๐Ÿ”— https://t.co/z9ioZPFCi9 ๐Ÿ“ง mmiemon [at] cs [dot] unc [dot] edu
Tweet card summary image
md-mohaiminul.github.io
A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.
0
4
18
@hannan_tanveer
Tanveer Hannan (on job market)
5 months
๐Ÿš€ Check out our latest work, ReVisionLLM, now featured on the MCML blog! ๐Ÿ” A Vision-Language Model for accurate temporal grounding in hour-long videos. ๐Ÿ‘‰ https://t.co/cTNNcRLsFE #VisionLanguage #MultimodalAI #MCML #CVPR2025
Tweet card summary image
mcml.ai
Tanveer Hannan and colleagues introduce ReVisionLLM, an AI model that mimics human skimming to accurately find key moments in long videos.
0
0
2
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout https://t.co/KAF47QI7yp @hannan_tanveer
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by ๐Ÿ“ Poster #307, Session 4 ๐Ÿ—“๏ธ June 14, 5โ€“7PM | ExHall D ๐Ÿ”— https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227
0
2
10
@hannan_tanveer
Tanveer Hannan (on job market)
5 months
Excited to have our paper ReVisionLLM presented today at #CVPR2025! Website:
lnkd.in
This link will take you to a page thatโ€™s not on LinkedIn
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by ๐Ÿ“ Poster #307, Session 4 ๐Ÿ—“๏ธ June 14, 5โ€“7PM | ExHall D ๐Ÿ”— https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227
0
0
0
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Had a great time presenting at the GenAI session @CiscoMerakiโ€”thanks @nahidalam for the invite๐Ÿ™ Catch us at #CVPR2025: ๐Ÿ“Œ BIMBA: https://t.co/4XCHPFWchy (June 15, 4โ€“6PM, Poster #282) ๐Ÿ“Œ ReVisionLLM: https://t.co/KAF47QI7yp (June 14, 5โ€“7PM, Poster #307) @gberta227 @hannan_tanveer
Tweet card summary image
arxiv.org
Large language models (LLMs) excel at retrieving information from lengthy text, but their vision-language counterparts (VLMs) face difficulties with hour-long videos, especially for temporal...
0
3
4
@lealtaixe
Laura Leal-Taixe
6 months
The time for new architectures is over? Not quite! SeNaTra, a native segmentation backbone, is waiting, let's see how it works ๐Ÿงต https://t.co/2I9nuLBsSz
Tweet card summary image
arxiv.org
Uniform downsampling remains the de facto standard for reducing spatial resolution in vision backbones. In this work, we propose an alternative design built around a content-aware spatial grouping...
3
41
206
@hannan_tanveer
Tanveer Hannan (on job market)
8 months
Effective long-context comprehension remains a significant hurdle for LLMs. Meta's forthcoming Llama 4 aims to address this by iRoPE architecture. I am looking forward to testing them on more real life setups like streaming videos.
@AIatMeta
AI at Meta
8 months
Today is the start of a new era of natively multimodal AI innovation. Today, weโ€™re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick โ€” our most advanced models yet and the best in their class for multimodality. Llama 4 Scout โ€ขย 17B-active-parameter model
0
0
1
@hannan_tanveer
Tanveer Hannan (on job market)
8 months
Check out the #CVPR2025 paper on long video understanding. It achieves SOTA with a much simpler and efficient end-to-end approach.
@mmiemon
Mohaiminul (Emon) Islam (on job market)
8 months
๐Ÿš€New #CVPR2025 Paper๐Ÿš€ Introducing BIMBA, an efficient multimodal LLM for long-range video QA๐Ÿ’ก It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. ๐ŸงตThread below๐Ÿ‘‡ https://t.co/yP9ZLkUX2N
0
1
2