rid @ridouaneg_ X Profile

rid

@ridouaneg_

Followers

34

Following

181

Media

7

Statuses

13

Joined March 2024

Don't wanna be here? Send us removal request.

rid

@ridouaneg_

1 year

(1/8) 🎬 Introducing the Short Film Dataset (SFD), a long video QA benchmark with 1k short films and 5k questions. Why another videoQA dataset? 📖 Story-level QAs 🎥 Publicly available videos 🔒 Minimal data leakage ⏳ Long temporal context questions https://t.co/FJQzIRgDxV

2

12

24

Ragav Sachdeva

@RagavSachdeva

4 months

🚨 #ICCV2025 Workshop Alert! 🚨 🔥[COMIQ] Comic Intelligence Quotient: Advances and Challenges in AI-driven Comic Analysis We’re exploring how machines interpret abstract visual storytelling media; dare I say, a true test for AGI 🫣 🔗 Pls consider submitting abstracts. Link👇

1

5

9

rid

@ridouaneg_

4 months

Nice benchmark using movies to check if LLMs understand characters' mental states (like beliefs and intents)! This ability is crucial for developing AIs that can live among humans and learn from them.

Emilio Villa Cueva

@evllcv

5 months

Excited to finally share MOMENTS!! A new human-annotated benchmark to evaluate Theory of Mind in multimodal LLMs using long-form videos with real human actors. 📽️ 2.3K+ MCQA items from 168 short films 🧠 Tests 7 different ToM abilities 🔗

0

2

rid

@ridouaneg_

5 months

Participate in our VideoQA competition! 🏆 Winners get to present their work at the SLoMO workshop #ICCV2025 https://t.co/MJs7ZsxocN

huggingface.co

Junyu Xie

@JunyuXieArthur

5 months

Movies are more than just video clips, they are stories! 🎬 We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions! Website: https://t.co/k1hDRCFjjd Competition: https://t.co/JseLilr6oc

0

1

5

Nicolas DUFOUR

@nico_dufour

1 year

🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocation—we predict global locations by refining random guesses into trajectories across the Earth's surface! 🗺️ Paper, code, and demo: https://t.co/pNRFZk9NYP

6

37

153

rid

@ridouaneg_

1 year

(1/8) 🎬 Introducing the Short Film Dataset (SFD), a long video QA benchmark with 1k short films and 5k questions. Why another videoQA dataset? 📖 Story-level QAs 🎥 Publicly available videos 🔒 Minimal data leakage ⏳ Long temporal context questions https://t.co/FJQzIRgDxV

2

12

24

rid

@ridouaneg_

1 year

(8/8) 🎬 Work done in collaboration with @xiwang92, @VickyKalogeiton, and Ivan Laptev. We thank all the people that helped us create this benchmark! 📽️: https://t.co/FJQzIRg5In 📜: https://t.co/4IVacjuxLo 🤗: https://t.co/UXUg9X4V2h

0

5

rid

@ridouaneg_

1 year

(7/8) Finally, we show that increasing the input context window (going from shot-level to movie-level information as input) improves task performance. This intuitive result confirms our approach and provides what we think is a valuable benchmark for the community.

1

0

4

rid

@ridouaneg_

1 year

(6/8) We also argue that modern VLMs are mature enough for open-ended videoQA. Therefore we introduce this task in our benchmark, with LLM-scoring as a metric. All models face difficulties on this task.

1

0

4

rid

@ridouaneg_

1 year

(5/8) Most models struggle (< 40%) compared to human perf (~90%). Only LLoVi, based on GPT-3.5, performs well (55.6%). It mostly relies on subtitles, while underperforming with vision-only. This highlights the need for truly multimodal methods that better integrate vision.

1

0

4

rid

@ridouaneg_

1 year

(4/8) We create 5k questions by leveraging LLMs, followed by a careful manual curation. Our efforts focus on developing three types of questions: setting-, character-, and story-related questions.

1

0

4

rid

@ridouaneg_

1 year

(3/8) We solve this issue by proposing a dataset of short films, i.e. amateur movies (5 to 20 mins) created by filmmakers to explore new ideas or to promote their work. Many high-quality films are publicly available on YouTube and they are unknown to LLMs.

1

0

4

rid

@ridouaneg_

1 year

(2/8) Movies are great to benchmark VLMs but they suffer from data leakage, i.e. modern LLMs memorized common movies and can answer questions given only movie names. For example, GPT4V achieve 71.3% accuracy on MovieQA without even watching the movies.

1

0

4