Mohaiminul (Emon) Islam (on job market) Profile
Mohaiminul (Emon) Islam (on job market)

@mmiemon

Followers
250
Following
546
Media
59
Statuses
389

๐Ž๐ง ๐ญ๐ก๐ž ๐ˆ๐ง๐๐ฎ๐ฌ๐ญ๐ซ๐ฒ ๐‰๐จ๐› ๐Œ๐š๐ซ๐ค๐ž๐ญ | Phd Student @unccs | 2x Research Intern @MetaAI. Computer Vision, Video Understanding, Multimodal, AI Agents.

Chapel Hill, NC
Joined April 2016
Don't wanna be here? Send us removal request.
@mmiemon
Mohaiminul (Emon) Islam (on job market)
4 months
๐Ÿš€ On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta ๐Ÿ” Seeking Research Scientist/Engineer roles! ๐Ÿ”— https://t.co/z9ioZPFCi9 ๐Ÿ“ง mmiemon [at] cs [dot] unc [dot] edu
Tweet card summary image
md-mohaiminul.github.io
A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.
0
4
18
@Kimi_Moonshot
Kimi.ai
2 hours
๐Ÿš€ Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. ๐Ÿ”น SOTA on HLE (44.9%) and BrowseComp (60.2%) ๐Ÿ”น Executes up to 200 โ€“ 300 sequential tool calls without human interference ๐Ÿ”น Excels in reasoning, agentic search, and coding ๐Ÿ”น 256K context window Built
170
306
2K
@jaeh0ng_yoon
Jaehong Yoon
3 days
๐ŸŽ‰ Excited to share that 5/5 of my papers (3 main, 2 findings) have been accepted at #EMNLP2025, in video/multimodal reasoning, instructional video editing, and efficient LLM adaptation & reasoning! ๐Ÿšจ Iโ€™m recruiting Ph.D. students to join the Multimodal AI Group at NTU College
15
31
306
@rohanpaul_ai
Rohan Paul
5 days
Fei-Fei Li (@drfeifei) on limitations of LLMs. "There's no language out there in nature. You don't go out in nature and there's words written in the sky for you.. There is a 3D world that follows laws of physics." Language is purely generated signal. https://t.co/FOomRpGTad
@rohanpaul_ai
Rohan Paul
5 days
Columbia CS Prof explains why LLMs canโ€™t generate new scientific ideas. Bcz LLMs learn a structured โ€œmapโ€, Bayesian manifold, of known data and work well within it, but fail outside it. But true discovery means creating new maps, which LLMs cannot do. https://t.co/PzI0YrTlpl
189
604
4K
@gberta227
Gedas Bertasius
6 days
Is language a "terrible abstraction" for video understanding? Many in the video community often dismiss language-driven approaches in favor of complex, video-native solutions. However, I believe this resistance stems more from internal biasโ€”validating a research identity as a
2
4
20
@mmiemon
Mohaiminul (Emon) Islam (on job market)
14 days
Yi Lin is an excellent researcher in relevant areas such as Multimodal LLMs, PEFT, and RL, whom I know personally.
@yilin_sung
Yi Lin Sung
14 days
Tough week! I also got impacted less than 3 months after joining. Ironically, I just landed some new RL infra features the day before. Life moves on. My past work spans RL, PEFT, Quantization, and Multimodal LLMs. If your team is working on these areas, Iโ€™d love to connect.
0
1
5
@mmiemon
Mohaiminul (Emon) Islam (on job market)
1 month
It should be a very useful feature!
@perplexity_ai
Perplexity
1 month
Introducing Perplexity Search API We've built a search index of billions of webpages to provide real-time, quality information from the web. Now developers have access to the full power of our index, providing the most accurate results in milliseconds. https://t.co/TDOT8vnWxA
1
0
0
@ZiyangW00
Ziyang Wang
3 months
๐ŸŽ‰Our Video-RTS paper has been accepted at #EMNLP2025 Main!! We propose a novel video reasoning approach that combines data-efficient reinforcement learning (GRPO) with video-adaptive test-time scaling, improving reasoning performance while maintaining efficiency on multiple
@ZiyangW00
Ziyang Wang
4 months
๐ŸšจIntroducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
1
30
39
@hanlin_hl
Han Lin
3 months
๐Ÿค” Can we bridge MLLMs and diffusion models more natively and efficiently, by having MLLMs produce patch-level CLIP latents already aligned with their visual encoders, while fully preserving MLLM's visual reasoning capabilities? Introducing Bifrost-1: ๐ŸŒˆ > High-Fidelity
3
62
149
@_akhaliq
AK
3 months
GLM-4.5 Agentic, Reasoning, and Coding (ARC) Foundation Models
7
27
156
@ElaineYaLe6
Elaine Ya Le
3 months
GPT-5 is here! ๐Ÿš€ For the first time, users donโ€™t have to choose between models โ€” or even think about model names. Just one seamless, unified experience. Itโ€™s also the first time frontier intelligence is available to everyone, including free users! GPT-5 sets new highs across
@sama
Sam Altman
3 months
going to try live-tweeting the GPT-5 livestream. first, GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not. it is very smart, intuitive, and fast. it is available to everyone, including the free tier, w/reasoning!
248
84
999
@mmiemon
Mohaiminul (Emon) Islam (on job market)
4 months
Checkout our new paper: Video-RTS ๐ŸŽฅ A data-efficient RL method for complex video reasoning tasks. ๐Ÿ”น Pure RL w/ output-based rewards. ๐Ÿ”น Novel sparse-to-dense Test-Time Scaling (TTS) to expand input frames via self-consistency. ๐Ÿ’ฅ 96.4% less training data! More in the thread๐Ÿ‘‡
@ZiyangW00
Ziyang Wang
4 months
๐ŸšจIntroducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
0
7
13
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Great to see our paper ReVisionLLM featured by MCML blog! @gberta227 #CVPR2025
@hannan_tanveer
Tanveer Hannan (on job market)
5 months
๐Ÿš€ Check out our latest work, ReVisionLLM, now featured on the MCML blog! ๐Ÿ” A Vision-Language Model for accurate temporal grounding in hour-long videos. ๐Ÿ‘‰ https://t.co/cTNNcRLsFE #VisionLanguage #MultimodalAI #MCML #CVPR2025
0
1
2
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Had a great time presenting BIMBA at #CVPR2025 today! Engaging discussions, thoughtful questions, and lots of interest in our work on long-range VideoQA ๐Ÿ”๐ŸŽฅ ๐Ÿ“ Paper: https://t.co/4XCHPFWchy ๐ŸŒ Project: https://t.co/alktUQtIzE ๐ŸŽฅ Demo: https://t.co/e823S80qIu
@mmiemon
Mohaiminul (Emon) Islam (on job market)
8 months
๐Ÿš€New #CVPR2025 Paper๐Ÿš€ Introducing BIMBA, an efficient multimodal LLM for long-range video QA๐Ÿ’ก It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. ๐ŸงตThread below๐Ÿ‘‡ https://t.co/yP9ZLkUX2N
0
1
17
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Come to our poster today at #CVPR2025! ๐Ÿ—“๏ธ June 15 | ๐Ÿ•“ 4โ€“6PM ๐Ÿ“ Poster #282 | ExHall D ๐Ÿ“ Paper: https://t.co/4XCHPFWchy ๐ŸŒ Project: https://t.co/alktUQtIzE ๐Ÿ’ป Code: https://t.co/mRWxTRCh6z ๐ŸŽฅ Youtube:
@mmiemon
Mohaiminul (Emon) Islam (on job market)
8 months
๐Ÿš€New #CVPR2025 Paper๐Ÿš€ Introducing BIMBA, an efficient multimodal LLM for long-range video QA๐Ÿ’ก It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. ๐ŸงตThread below๐Ÿ‘‡ https://t.co/yP9ZLkUX2N
0
2
10
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout https://t.co/KAF47QI7yp @hannan_tanveer
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by ๐Ÿ“ Poster #307, Session 4 ๐Ÿ—“๏ธ June 14, 5โ€“7PM | ExHall D ๐Ÿ”— https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227
0
2
10
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
#CVPR social was a blast this year!
0
0
1
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by ๐Ÿ“ Poster #307, Session 4 ๐Ÿ—“๏ธ June 14, 5โ€“7PM | ExHall D ๐Ÿ”— https://t.co/qrBvf2UUAo @hannan_tanveer @gberta227
0
3
7
@gberta227
Gedas Bertasius
5 months
Another great accomplishment by Emon this #CVPR2025. Interestingly, rather than using some complex ensemble model, Emon won the EgoSchema challenge by simply applying his latest BIMBA model, which he will also present at the poster session on Sunday 4-6pm. Be sure to stop by!
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
๐Ÿš€ Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task ๐ŸŽฏ Stop by #CVPR: ๐Ÿ“ Poster #282 | June 15, 4โ€“6PM | ExHall D ๐Ÿ”— https://t.co/alktUQtIzE
1
4
26
@mmiemon
Mohaiminul (Emon) Islam (on job market)
5 months
๐Ÿš€ Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task ๐ŸŽฏ Stop by #CVPR: ๐Ÿ“ Poster #282 | June 15, 4โ€“6PM | ExHall D ๐Ÿ”— https://t.co/alktUQtIzE
0
2
10