Manos Zaranis Profile
Manos Zaranis

@ManosZaranis

Followers
103
Following
194
Media
12
Statuses
49

PhD student @istecnico | 2020 Alumni ECE NTUA

Joined March 2018
Don't wanna be here? Send us removal request.
@ManosZaranis
Manos Zaranis
28 days
🚨Meet MF²: Movie Facts & Fibs: a new benchmark for long-movie understanding!.🤔Do you think your model understands movies?. Unlike existing benchmarks, MF² targets memorable events, emotional arcs 💔, and causal chains 🔗 — things humans recall easily, but even top models like
Tweet media one
2
22
56
@ManosZaranis
Manos Zaranis
28 days
📂 Code & Data Release.Read the full paper here 👉 🚀 Our code is openly available: 📁 Full Dataset on @huggingface 🤗: MF² is open and ready — now it’s your turn!.Can your model truly understand a full.
0
0
5
@ManosZaranis
Manos Zaranis
28 days
This project is the result of an amazing collaboration between researchers at @istecnico, @illc_amsterdam, @IRI_robotics, @IRI_robotics, @UNC, @UCPH_Research, Pioneer Center for AI, @HeriotWattUni, Free University of Bozen-Bolzano, @Unbabel, @Ellis_Amsterdam, @Lisbon_ELLIS,.
1
2
10
@ManosZaranis
Manos Zaranis
28 days
🌍 Open, Reusable, Expandable. MF² isn’t just a benchmark — it’s a resource for the community:.🔓 Fully open-license dataset.🎞️ Includes claim pairs + movie metadata (including subtitles).
1
0
4
@ManosZaranis
Manos Zaranis
28 days
📊Fine-Grained Model Performance. 👤 Humans still lead overall!.🤖 Gemini 2.5 Pro comes 2nd, while LLaVA-Video & InternVL3 trail behind. MF² spans reasoning from 🧩 single scenes → 🧩🧩 multi-scenes → 🌍 global plotlines. As reasoning complexity increases, model performance
Tweet media one
1
0
5
@ManosZaranis
Manos Zaranis
28 days
📚Beyond Vision: The Role of Textual and World Knowledge. Despite being multimodal, models like Gemini 2.5 Pro lean heavily on text. 📝 Subtitles + 📚 Pretrained world knowledge > 👁️ Actual visuals.Text remains the backbone of long-video understanding, for now.
Tweet media one
1
0
5
@ManosZaranis
Manos Zaranis
28 days
🚩Benchmarking the Gap. Can models truly understand movies like we do? 🎬🤔. Both open-weight and closed SOTA vision-language models fall far short of human performance, revealing a major blind spot in current video AI systems.
Tweet media one
1
0
5
@ManosZaranis
Manos Zaranis
28 days
🔍Multi-Level Reasoning & Narrative Understanding. MF² claims require reasoning at multiple levels:. 🔹 Single scene. 🔹 Multiple scenes. 🔹 Global (entire movie plots). They also test key comprehension skills like temporal perception ⏳, causal reasoning 🔗, and more….
1
0
5
@ManosZaranis
Manos Zaranis
28 days
🧠Why does MF² stands out?. 🧩Each example pairs a fact ✅ and a fib ❌. No trivial details here! We focus on memorable turning points: events, emotional arcs 💔, cause-effect chains 🔗, and more…
Tweet media one
1
0
5
@ManosZaranis
Manos Zaranis
28 days
🎥Tired of short video datasets?. ⏱️MF² goes long — average duration: 88.33 minutes (way longer than most!). With manual annotations, a unique claim pair format, and open-license movie releases, we tackle key limitations of existing benchmarks.
Tweet media one
1
0
5
@ManosZaranis
Manos Zaranis
28 days
Our dataset targets deep narrative comprehension, not just shallow recall. MF² puts models to the test with 850+ 🧠 human-written claim pairs (one true and one plausible but false) across 50+ full-length, open-license films🎬.
1
0
5
@ManosZaranis
Manos Zaranis
28 days
🚀 Big news! Tower+ is here — our strongest open-weight multilingual model yet!.
@RicardoRei7
Ricardo Rei
28 days
🚀 Tower+: our latest model in the Tower family — sets a new standard for open-weight multilingual models!.We show how to go beyond sentence-level translation, striking a balance between translation quality and general multilingual capabilities. 1/5.
Tweet media one
0
0
5
@ManosZaranis
Manos Zaranis
1 month
Check out TREQA! TL;DR: We evaluate translation quality of complex content through QA using LLMs.
@psanfernandes
Patrick Fernandes
2 months
MT metrics excel at evaluating sentence translations, but struggle with complex texts. We introduce *TREQA* a framework to assess how translations preserve key info by using LLMs to generate & answer questions about them. (co-lead @swetaagrawal20). 1/15
Tweet media one
0
0
1
@ManosZaranis
Manos Zaranis
3 months
RT @zmprcp: New paper out 🚀 Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models: https://….
0
7
0
@ManosZaranis
Manos Zaranis
6 months
RT @NafiseSadat: The position is advertised for 12 months, but it has the possibility of a further 2-year extension.
0
3
0
@ManosZaranis
Manos Zaranis
6 months
RT @Saul_Santos1997: 🚀 New paper alert! 🚀. Ever tried asking an AI about a 2-hour movie? Yeah… not great. Check: ∞-Video: A Training-Free….
0
5
0
@ManosZaranis
Manos Zaranis
8 months
RT @nunonmg: The second, even better and bigger model is now out: EuroLLM-9B 🇪🇺. Ranks as the best open EU-made LLM of its size, proving co….
lnkd.in
This link will take you to a page that’s not on LinkedIn
0
5
0
@ManosZaranis
Manos Zaranis
8 months
RT @zmprcp: We built the best EU-made LLM of its size! It supports all EU languages (and more), and beats Meta's Llama-3.1 on multilingual….
0
2
0
@ManosZaranis
Manos Zaranis
8 months
We call for:.🥇incorporating bias evaluation as a standard practice in the meta-evaluation of QE metrics. 🥈Developing mitigation strategies to address these biases. Our goal? Translation systems that are fair, accurate, and inclusive for all users.🌍✨.
0
0
2
@ManosZaranis
Manos Zaranis
8 months
‼️When QE metrics are used to improve translation quality in MT systems (e.g., QAD), they can amplify gender bias. .
1
0
2