rid Profile
rid

@ridouaneg_

Followers
34
Following
181
Media
7
Statuses
13

Joined March 2024
Don't wanna be here? Send us removal request.
@ridouaneg_
rid
1 year
(1/8) 🎬 Introducing the Short Film Dataset (SFD), a long video QA benchmark with 1k short films and 5k questions. Why another videoQA dataset? πŸ“– Story-level QAs πŸŽ₯ Publicly available videos πŸ”’ Minimal data leakage ⏳ Long temporal context questions https://t.co/FJQzIRgDxV
2
12
24
@RagavSachdeva
Ragav Sachdeva
4 months
🚨 #ICCV2025 Workshop Alert! 🚨 πŸ”₯[COMIQ] Comic Intelligence Quotient: Advances and Challenges in AI-driven Comic Analysis We’re exploring how machines interpret abstract visual storytelling media; dare I say, a true test for AGI 🫣 πŸ”— Pls consider submitting abstracts. LinkπŸ‘‡
1
5
9
@ridouaneg_
rid
4 months
Nice benchmark using movies to check if LLMs understand characters' mental states (like beliefs and intents)! This ability is crucial for developing AIs that can live among humans and learn from them.
@evllcv
Emilio Villa Cueva
5 months
Excited to finally share MOMENTS!! A new human-annotated benchmark to evaluate Theory of Mind in multimodal LLMs using long-form videos with real human actors. πŸ“½οΈ 2.3K+ MCQA items from 168 short films 🧠 Tests 7 different ToM abilities πŸ”—
0
0
2
@ridouaneg_
rid
5 months
Participate in our VideoQA competition! πŸ† Winners get to present their work at the SLoMO workshop #ICCV2025 https://t.co/MJs7ZsxocN
huggingface.co
@JunyuXieArthur
Junyu Xie
5 months
Movies are more than just video clips, they are stories! 🎬 We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions! Website: https://t.co/k1hDRCFjjd Competition: https://t.co/JseLilr6oc
0
1
5
@nico_dufour
Nicolas DUFOUR
1 year
🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocationβ€”we predict global locations by refining random guesses into trajectories across the Earth's surface! πŸ—ΊοΈ Paper, code, and demo: https://t.co/pNRFZk9NYP
6
37
153
@ridouaneg_
rid
1 year
(1/8) 🎬 Introducing the Short Film Dataset (SFD), a long video QA benchmark with 1k short films and 5k questions. Why another videoQA dataset? πŸ“– Story-level QAs πŸŽ₯ Publicly available videos πŸ”’ Minimal data leakage ⏳ Long temporal context questions https://t.co/FJQzIRgDxV
2
12
24
@ridouaneg_
rid
1 year
(8/8) 🎬 Work done in collaboration with @xiwang92, @VickyKalogeiton, and Ivan Laptev. We thank all the people that helped us create this benchmark! πŸ“½οΈ: https://t.co/FJQzIRg5In πŸ“œ: https://t.co/4IVacjuxLo πŸ€—: https://t.co/UXUg9X4V2h
0
0
5
@ridouaneg_
rid
1 year
(7/8) Finally, we show that increasing the input context window (going from shot-level to movie-level information as input) improves task performance. This intuitive result confirms our approach and provides what we think is a valuable benchmark for the community.
1
0
4
@ridouaneg_
rid
1 year
(6/8) We also argue that modern VLMs are mature enough for open-ended videoQA. Therefore we introduce this task in our benchmark, with LLM-scoring as a metric. All models face difficulties on this task.
1
0
4
@ridouaneg_
rid
1 year
(5/8) Most models struggle (< 40%) compared to human perf (~90%). Only LLoVi, based on GPT-3.5, performs well (55.6%). It mostly relies on subtitles, while underperforming with vision-only. This highlights the need for truly multimodal methods that better integrate vision.
1
0
4
@ridouaneg_
rid
1 year
(4/8) We create 5k questions by leveraging LLMs, followed by a careful manual curation. Our efforts focus on developing three types of questions: setting-, character-, and story-related questions.
1
0
4
@ridouaneg_
rid
1 year
(3/8) We solve this issue by proposing a dataset of short films, i.e. amateur movies (5 to 20 mins) created by filmmakers to explore new ideas or to promote their work. Many high-quality films are publicly available on YouTube and they are unknown to LLMs.
1
0
4
@ridouaneg_
rid
1 year
(2/8) Movies are great to benchmark VLMs but they suffer from data leakage, i.e. modern LLMs memorized common movies and can answer questions given only movie names. For example, GPT4V achieve 71.3% accuracy on MovieQA without even watching the movies.
1
0
4