Mohaiminul (Emon) Islam (on job market) @mmiemon X Profile

Mohaiminul (Emon) Islam (on job market)

@mmiemon

Followers

250

Following

546

Media

59

Statuses

389

𝐎𝐧 𝐭𝐡𝐞 𝐈𝐧𝐝𝐮𝐬𝐭𝐫𝐲 𝐉𝐨𝐛 𝐌𝐚𝐫𝐤𝐞𝐭 | Phd Student @unccs | 2x Research Intern @MetaAI. Computer Vision, Video Understanding, Multimodal, AI Agents.

https://t.co/m2DrIkvdzj

Chapel Hill, NC

Joined April 2016

Don't wanna be here? Send us removal request.

Mohaiminul (Emon) Islam (on job market)

@mmiemon

4 months

🚀 On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta 🔍 Seeking Research Scientist/Engineer roles! 🔗 https://t.co/z9ioZPFCi9 📧 mmiemon [at] cs [dot] unc [dot] edu

md-mohaiminul.github.io

A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.

0

4

18

Kimi.ai

@Kimi_Moonshot

2 hours

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built

170

306

2K

Jaehong Yoon

@jaeh0ng_yoon

3 days

🎉 Excited to share that 5/5 of my papers (3 main, 2 findings) have been accepted at #EMNLP2025, in video/multimodal reasoning, instructional video editing, and efficient LLM adaptation & reasoning! 🚨 I’m recruiting Ph.D. students to join the Multimodal AI Group at NTU College

15

31

306

Rohan Paul

@rohanpaul_ai

5 days

Fei-Fei Li (@drfeifei) on limitations of LLMs. "There's no language out there in nature. You don't go out in nature and there's words written in the sky for you.. There is a 3D world that follows laws of physics." Language is purely generated signal. https://t.co/FOomRpGTad

Rohan Paul

@rohanpaul_ai

5 days

Columbia CS Prof explains why LLMs can’t generate new scientific ideas. Bcz LLMs learn a structured “map”, Bayesian manifold, of known data and work well within it, but fail outside it. But true discovery means creating new maps, which LLMs cannot do. https://t.co/PzI0YrTlpl

189

604

4K

Gedas Bertasius

@gberta227

6 days

Is language a "terrible abstraction" for video understanding? Many in the video community often dismiss language-driven approaches in favor of complex, video-native solutions. However, I believe this resistance stems more from internal bias—validating a research identity as a

2

4

20

Mohaiminul (Emon) Islam (on job market)

@mmiemon

14 days

Yi Lin is an excellent researcher in relevant areas such as Multimodal LLMs, PEFT, and RL, whom I know personally.

Yi Lin Sung

@yilin_sung

14 days

Tough week! I also got impacted less than 3 months after joining. Ironically, I just landed some new RL infra features the day before. Life moves on. My past work spans RL, PEFT, Quantization, and Multimodal LLMs. If your team is working on these areas, I’d love to connect.

0

1

5

Mohaiminul (Emon) Islam (on job market)

@mmiemon

1 month

It should be a very useful feature!

Perplexity

@perplexity_ai

1 month

Introducing Perplexity Search API We've built a search index of billions of webpages to provide real-time, quality information from the web. Now developers have access to the full power of our index, providing the most accurate results in milliseconds. https://t.co/TDOT8vnWxA

1

0

Ziyang Wang

@ZiyangW00

3 months

🎉Our Video-RTS paper has been accepted at #EMNLP2025 Main!! We propose a novel video reasoning approach that combines data-efficient reinforcement learning (GRPO) with video-adaptive test-time scaling, improving reasoning performance while maintaining efficiency on multiple

Ziyang Wang

@ZiyangW00

4 months

🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles

1

30

39

Han Lin

@hanlin_hl

3 months

🤔 Can we bridge MLLMs and diffusion models more natively and efficiently, by having MLLMs produce patch-level CLIP latents already aligned with their visual encoders, while fully preserving MLLM's visual reasoning capabilities? Introducing Bifrost-1: 🌈 > High-Fidelity

3

62

149

AK

@_akhaliq

3 months

GLM-4.5 Agentic, Reasoning, and Coding (ARC) Foundation Models

7

27

156

Elaine Ya Le

@ElaineYaLe6

3 months

GPT-5 is here! 🚀 For the first time, users don’t have to choose between models — or even think about model names. Just one seamless, unified experience. It’s also the first time frontier intelligence is available to everyone, including free users! GPT-5 sets new highs across

Sam Altman

@sama

3 months

going to try live-tweeting the GPT-5 livestream. first, GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not. it is very smart, intuitive, and fast. it is available to everyone, including the free tier, w/reasoning!

248

84

999

Mohaiminul (Emon) Islam (on job market)

@mmiemon

4 months

Checkout our new paper: Video-RTS 🎥 A data-efficient RL method for complex video reasoning tasks. 🔹 Pure RL w/ output-based rewards. 🔹 Novel sparse-to-dense Test-Time Scaling (TTS) to expand input frames via self-consistency. 💥 96.4% less training data! More in the thread👇

Ziyang Wang

@ZiyangW00

4 months

🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles

0

7

13

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Great to see our paper ReVisionLLM featured by MCML blog! @gberta227 #CVPR2025

Tanveer Hannan (on job market)

@hannan_tanveer

5 months

🚀 Check out our latest work, ReVisionLLM, now featured on the MCML blog! 🔍 A Vision-Language Model for accurate temporal grounding in hour-long videos. 👉 https://t.co/cTNNcRLsFE #VisionLanguage #MultimodalAI #MCML #CVPR2025

0

1

2

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Had a great time presenting BIMBA at #CVPR2025 today! Engaging discussions, thoughtful questions, and lots of interest in our work on long-range VideoQA 🔍🎥 📝 Paper: https://t.co/4XCHPFWchy 🌐 Project: https://t.co/alktUQtIzE 🎥 Demo: https://t.co/e823S80qIu

Mohaiminul (Emon) Islam (on job market)

@mmiemon

8 months

🚀New #CVPR2025 Paper🚀 Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡 It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. 🧵Thread below👇 https://t.co/yP9ZLkUX2N

0

1

17

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Come to our poster today at #CVPR2025! 🗓️ June 15 | 🕓 4–6PM 📍 Poster #282 | ExHall D 📝 Paper: https://t.co/4XCHPFWchy 🌐 Project: https://t.co/alktUQtIzE 💻 Code: https://t.co/mRWxTRCh6z 🎥 Youtube:

Mohaiminul (Emon) Islam (on job market)

@mmiemon

8 months

🚀New #CVPR2025 Paper🚀 Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡 It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. 🧵Thread below👇 https://t.co/yP9ZLkUX2N

0

2

10

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout https://t.co/KAF47QI7yp @hannan_tanveer

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227

0

2

10

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

#CVPR social was a blast this year!

0

1

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227

0

3

7

Gedas Bertasius

@gberta227

5 months

Another great accomplishment by Emon this #CVPR2025. Interestingly, rather than using some complex ensemble model, Emon won the EgoSchema challenge by simply applying his latest BIMBA model, which he will also present at the poster session on Sunday 4-6pm. Be sure to stop by!

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

🚀 Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task 🎯 Stop by #CVPR: 📍 Poster #282 | June 15, 4–6PM | ExHall D 🔗 https://t.co/alktUQtIzE

1

4

26

Mohaiminul (Emon) Islam (on job market)

@mmiemon

5 months

🚀 Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task 🎯 Stop by #CVPR: 📍 Poster #282 | June 15, 4–6PM | ExHall D 🔗 https://t.co/alktUQtIzE

0

2

10