Mohaiminul (Emon) Islam (on job market)
@mmiemon
Followers
250
Following
546
Media
59
Statuses
389
๐๐ง ๐ญ๐ก๐ ๐๐ง๐๐ฎ๐ฌ๐ญ๐ซ๐ฒ ๐๐จ๐ ๐๐๐ซ๐ค๐๐ญ | Phd Student @unccs | 2x Research Intern @MetaAI. Computer Vision, Video Understanding, Multimodal, AI Agents.
Chapel Hill, NC
Joined April 2016
๐ On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta ๐ Seeking Research Scientist/Engineer roles! ๐ https://t.co/z9ioZPFCi9 ๐ง mmiemon [at] cs [dot] unc [dot] edu
md-mohaiminul.github.io
A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.
0
4
18
๐ Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. ๐น SOTA on HLE (44.9%) and BrowseComp (60.2%) ๐น Executes up to 200 โ 300 sequential tool calls without human interference ๐น Excels in reasoning, agentic search, and coding ๐น 256K context window Built
170
306
2K
๐ Excited to share that 5/5 of my papers (3 main, 2 findings) have been accepted at #EMNLP2025, in video/multimodal reasoning, instructional video editing, and efficient LLM adaptation & reasoning! ๐จ Iโm recruiting Ph.D. students to join the Multimodal AI Group at NTU College
15
31
306
Fei-Fei Li (@drfeifei) on limitations of LLMs. "There's no language out there in nature. You don't go out in nature and there's words written in the sky for you.. There is a 3D world that follows laws of physics." Language is purely generated signal. https://t.co/FOomRpGTad
Columbia CS Prof explains why LLMs canโt generate new scientific ideas. Bcz LLMs learn a structured โmapโ, Bayesian manifold, of known data and work well within it, but fail outside it. But true discovery means creating new maps, which LLMs cannot do. https://t.co/PzI0YrTlpl
189
604
4K
Is language a "terrible abstraction" for video understanding? Many in the video community often dismiss language-driven approaches in favor of complex, video-native solutions. However, I believe this resistance stems more from internal biasโvalidating a research identity as a
2
4
20
Yi Lin is an excellent researcher in relevant areas such as Multimodal LLMs, PEFT, and RL, whom I know personally.
Tough week! I also got impacted less than 3 months after joining. Ironically, I just landed some new RL infra features the day before. Life moves on. My past work spans RL, PEFT, Quantization, and Multimodal LLMs. If your team is working on these areas, Iโd love to connect.
0
1
5
It should be a very useful feature!
Introducing Perplexity Search API We've built a search index of billions of webpages to provide real-time, quality information from the web. Now developers have access to the full power of our index, providing the most accurate results in milliseconds. https://t.co/TDOT8vnWxA
1
0
0
๐Our Video-RTS paper has been accepted at #EMNLP2025 Main!! We propose a novel video reasoning approach that combines data-efficient reinforcement learning (GRPO) with video-adaptive test-time scaling, improving reasoning performance while maintaining efficiency on multiple
๐จIntroducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
1
30
39
๐ค Can we bridge MLLMs and diffusion models more natively and efficiently, by having MLLMs produce patch-level CLIP latents already aligned with their visual encoders, while fully preserving MLLM's visual reasoning capabilities? Introducing Bifrost-1: ๐ > High-Fidelity
3
62
149
GPT-5 is here! ๐ For the first time, users donโt have to choose between models โ or even think about model names. Just one seamless, unified experience. Itโs also the first time frontier intelligence is available to everyone, including free users! GPT-5 sets new highs across
going to try live-tweeting the GPT-5 livestream. first, GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not. it is very smart, intuitive, and fast. it is available to everyone, including the free tier, w/reasoning!
248
84
999
Checkout our new paper: Video-RTS ๐ฅ A data-efficient RL method for complex video reasoning tasks. ๐น Pure RL w/ output-based rewards. ๐น Novel sparse-to-dense Test-Time Scaling (TTS) to expand input frames via self-consistency. ๐ฅ 96.4% less training data! More in the thread๐
๐จIntroducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
0
7
13
Great to see our paper ReVisionLLM featured by MCML blog! @gberta227 #CVPR2025
๐ Check out our latest work, ReVisionLLM, now featured on the MCML blog! ๐ A Vision-Language Model for accurate temporal grounding in hour-long videos. ๐ https://t.co/cTNNcRLsFE
#VisionLanguage #MultimodalAI #MCML #CVPR2025
0
1
2
Had a great time presenting BIMBA at #CVPR2025 today! Engaging discussions, thoughtful questions, and lots of interest in our work on long-range VideoQA ๐๐ฅ ๐ Paper: https://t.co/4XCHPFWchy ๐ Project: https://t.co/alktUQtIzE ๐ฅ Demo: https://t.co/e823S80qIu
๐New #CVPR2025 Paper๐ Introducing BIMBA, an efficient multimodal LLM for long-range video QA๐ก It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. ๐งตThread below๐ https://t.co/yP9ZLkUX2N
0
1
17
Come to our poster today at #CVPR2025! ๐๏ธ June 15 | ๐ 4โ6PM ๐ Poster #282 | ExHall D ๐ Paper: https://t.co/4XCHPFWchy ๐ Project: https://t.co/alktUQtIzE ๐ป Code: https://t.co/mRWxTRCh6z ๐ฅ Youtube:
๐New #CVPR2025 Paper๐ Introducing BIMBA, an efficient multimodal LLM for long-range video QA๐ก It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. ๐งตThread below๐ https://t.co/yP9ZLkUX2N
0
2
10
Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout https://t.co/KAF47QI7yp
@hannan_tanveer
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by ๐ Poster #307, Session 4 ๐๏ธ June 14, 5โ7PM | ExHall D ๐ https://t.co/qrBvf2UUAo
@hannan_tanveer @gberta227
0
2
10
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by ๐ Poster #307, Session 4 ๐๏ธ June 14, 5โ7PM | ExHall D ๐ https://t.co/qrBvf2UUAo
@hannan_tanveer @gberta227
0
3
7
Another great accomplishment by Emon this #CVPR2025. Interestingly, rather than using some complex ensemble model, Emon won the EgoSchema challenge by simply applying his latest BIMBA model, which he will also present at the poster session on Sunday 4-6pm. Be sure to stop by!
๐ Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task ๐ฏ Stop by #CVPR: ๐ Poster #282 | June 15, 4โ6PM | ExHall D ๐ https://t.co/alktUQtIzE
1
4
26
๐ Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task ๐ฏ Stop by #CVPR: ๐ Poster #282 | June 15, 4โ6PM | ExHall D ๐ https://t.co/alktUQtIzE
0
2
10