Gedas Bertasius
@gberta227
Followers
1K
Following
3K
Media
45
Statuses
502
Assistant Professor at UNC, previously a postdoc at Meta AI, PhD from UPenn, video understanding, multimodal AI, a basketball enthusiast.
Chapel Hill, NC
Joined June 2020
Is language a "terrible abstraction" for video understanding? Many in the video community often dismiss language-driven approaches in favor of complex, video-native solutions. However, I believe this resistance stems more from internal bias—validating a research identity as a
2
4
20
🚨 Check out our awesome students/postdocs' papers at #EMNLP2025 and say hi to them 👋! Also, I will give a keynote (virtually) on "Attributable, Conflict-Robust, and Multimodal Summarization with Multi-Source Retrieval" at the NewSumm workshop. -- Jaehong (in-person) finished
2
30
63
@cyjustinchen @ArchikiPrasad @swarnaNLP @EliasEskin -- Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning @ZiyangW00 @jaeh0ng_yoon @shoubin621 @mmiemon @gberta227
https://t.co/THxKAhgCPX
https://t.co/c6s8hnrKFH
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
1
3
8
🚨 BREAKING: AI Can't Actually See Videos. New benchmark shows mainstream LVLMs barely hit 60% accuracy—while humans reach 94.82%. This isn’t a glitch—it’s a fundamental failure in video understanding. LVLMs are doing visual theater, not real comprehension.
2
9
19
🎉 Excited to share that 5/5 of my papers (3 main, 2 findings) have been accepted at #EMNLP2025, in video/multimodal reasoning, instructional video editing, and efficient LLM adaptation & reasoning! 🚨 I’m recruiting Ph.D. students to join the Multimodal AI Group at NTU College
15
32
311
🥳 Honored and grateful to be awarded an NDSEG Fellowship in Computer Science! 💫🇺🇸 Big thanks to my advisor @mohitban47 for his guidance, and shoutout to my lab mates at @unc_ai_group, collaborators, internship advisors, and mentors for their support 🤗 Excited to continue
🎉 Congratulations to our student Zaid Khan (advised by @mohitban47) for being awarded a prestigious NDSEG Fellowship for his work on environment generation! Established in 1989, the fellowship has an acceptance rate of <7% and covers diverse science and engineering disciplines.
15
20
48
I'll be joining the faculty @JohnsHopkins late next year as a tenure-track assistant professor in @JHUCompSci Looking for PhD students to join me tackling fun problems in robot manipulation, learning from human data, understanding+predicting physical interactions, and beyond!
87
112
861
Can AI models teach you to shoot like Steph Curry? 🏀 Come to my talk on Challenges in Expert-Level Skill Analysis at 4:30 pm in Room 318-A tomorrow (Sunday) to find out! https://t.co/gYPFtEB1ZU
#ICCV2025
sauafg-workshop.github.io
ICCV 2025 SAUAFG Workshop on AI-driven skill assessment, understanding, and feedback generation.
🗓Oct 19, 2025 | 📍Hawaii Convention Center, Room 318-A 👉 Learn more: https://t.co/J9BCFRmuo7 🔍 We'll explore AI-driven Skilled Activity Understanding, Assessment & Guidance generation in various domains from Surgery to Sports, from Robotics and Manufacturing to Education
0
3
15
How can an agent reverse engineer the underlying laws of an unknown, hostile & stochastic environment in “one life”, without millions of steps + human-provided goals / rewards? In our work, we: 1️⃣ infer an executable symbolic world model (a probabilistic program capturing
2
42
89
📣 Announcing 1st International Workshop on Skilled Activity Understanding, Assessment & Feedback Generation @ICCVConference! 🎙️ All-star keynotes: @anfurnari, @walteriomayolc, @rgnespolo, @gberta227, Kristen Grauman, @eadeli 🧠+ Poster Presentations 🗓 Oct 19 · 1:45–6 PM HST
1
1
3
Excited to share our new work: “Learning to See Before Seeing”! 🧠➡️👀 We investigate an interesting phenomeno: how do LLMs, trained only on text, learn about the visual world? Project page: https://t.co/9mQt3qnckL
7
24
149
🎉Our Video-RTS paper has been accepted at #EMNLP2025 Main!! We propose a novel video reasoning approach that combines data-efficient reinforcement learning (GRPO) with video-adaptive test-time scaling, improving reasoning performance while maintaining efficiency on multiple
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
1
30
40
Checkout our new paper: Video-RTS 🎥 A data-efficient RL method for complex video reasoning tasks. 🔹 Pure RL w/ output-based rewards. 🔹 Novel sparse-to-dense Test-Time Scaling (TTS) to expand input frames via self-consistency. 💥 96.4% less training data! More in the thread👇
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
0
7
13
🚨Introducing Video-RTS: Resource-Efficient RL for Video Reasoning with Adaptive Video TTS! While RL-based video reasoning with LLMs has advanced, the reliance on large-scale SFT with extensive video data and long CoT annotations remains a major bottleneck. Video-RTS tackles
1
37
42
🚀 On the job market! Final-year PhD @ UNC Chapel Hill working on computer vision, video understanding, multimodal LLMs & AI agents. 2x Research Scientist Intern @Meta 🔍 Seeking Research Scientist/Engineer roles! 🔗 https://t.co/z9ioZPFCi9 📧 mmiemon [at] cs [dot] unc [dot] edu
md-mohaiminul.github.io
A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.
0
4
18
Great to see our paper ReVisionLLM featured by MCML blog! @gberta227 #CVPR2025
🚀 Check out our latest work, ReVisionLLM, now featured on the MCML blog! 🔍 A Vision-Language Model for accurate temporal grounding in hour-long videos. 👉 https://t.co/cTNNcRLsFE
#VisionLanguage #MultimodalAI #MCML #CVPR2025
0
1
2
Come to our poster today at #CVPR2025! 🗓️ June 15 | 🕓 4–6PM 📍 Poster #282 | ExHall D 📝 Paper: https://t.co/4XCHPFWchy 🌐 Project: https://t.co/alktUQtIzE 💻 Code: https://t.co/mRWxTRCh6z 🎥 Youtube:
🚀New #CVPR2025 Paper🚀 Introducing BIMBA, an efficient multimodal LLM for long-range video QA💡 It sets SOTA on 7 VQA benchmarks by intelligently selecting key spatiotemporal tokens utilizing the selective scan mechanism of Mamba models. 🧵Thread below👇 https://t.co/yP9ZLkUX2N
0
2
10
Great to see a lot of interest among the video understanding community about ReVisionLLM! If you missed it, checkout https://t.co/KAF47QI7yp
@hannan_tanveer
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 https://t.co/qrBvf2UUAo
@hannan_tanveer @gberta227
0
2
10
Presenting ReVisionLLM at #CVPR2025 today! Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos If you are at CVPR, please stop by 📍 Poster #307, Session 4 🗓️ June 14, 5–7PM | ExHall D 🔗 https://t.co/qrBvf2UUAo
@hannan_tanveer @gberta227
0
3
7
Another great accomplishment by Emon this #CVPR2025. Interestingly, rather than using some complex ensemble model, Emon won the EgoSchema challenge by simply applying his latest BIMBA model, which he will also present at the poster session on Sunday 4-6pm. Be sure to stop by!
🚀 Excited to share that we won 1st place at the EgoSchema Challenge at EgoVis, #CVPR2025! Our method (81%) outperformed human accuracy (76.2%) for the first time on this challenging task 🎯 Stop by #CVPR: 📍 Poster #282 | June 15, 4–6PM | ExHall D 🔗 https://t.co/alktUQtIzE
1
4
26