benno_krojer Profile Banner
Benno Krojer Profile
Benno Krojer

@benno_krojer

Followers
2K
Following
51K
Media
331
Statuses
6K

AI phding @Mila_Quebec @mcgillu (past: @AIatMeta). Interests: interpretability, language grounding (V+L), evals, reasoning. Vanier Scholar. 🥏⚽🥨

Montréal, Québec
Joined June 2014
Don't wanna be here? Send us removal request.
@benno_krojer
Benno Krojer
22 days
Excited to share the results of my internship research with @AIatMeta, as part of a larger world modeling release!. What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?. And how can we instead curate shortcut-robust examples at a large-scale?. Details 👇🔬
Tweet media one
@AIatMeta
AI at Meta
24 days
Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model,
3
22
59
@benno_krojer
Benno Krojer
19 hours
RT @cohere: Cohere is excited to announce our new office in Montreal, QC!. We look forward to contributing to the local AI landscape, colla….
0
23
0
@benno_krojer
Benno Krojer
2 days
RT @lucasmaes_: I genuinely think @benno_krojer's work offers a much fairer and insightful way to assess the physics understanding of Vide….
0
1
0
@benno_krojer
Benno Krojer
4 days
Welcome to the lab, doctor!.
@vernadankers
Verna Dankers
4 days
I miss Edinburgh and its wonderful people already!! Thanks to @tallinzen and @PontiEdoardo for inspiring discussions during the viva! I'm now exchanging Arthur's Seat for Mont Royal to join @sivareddyg's wonderful lab @Mila_Quebec 🤩.
0
0
2
@benno_krojer
Benno Krojer
9 days
RT @cesare_spinoso: A blizzard is raging in Montreal when your friend says “Wow, the weather is amazing!” Humans easily interpret irony, wh….
0
11
0
@benno_krojer
Benno Krojer
10 days
Also check out our previous two episodes! They didn't have a single guest, instead:. 1) we introduce the podcast and how Tom and I got into research in Ep 00.2) we interview several people at Mila just before the Neurips deadline about their submissions in Ep 01.
0
0
1
@benno_krojer
Benno Krojer
10 days
Started a new podcast with @tvergarabrowne !. Behind the Research of AI: .We look behind the scenes, beyond the polished papers 🧐🧪 . If this sounds fun, check out our first "official" episode with the awesome @gauthier_gidel from @Mila_Quebec:.
Tweet media one
1
13
41
@benno_krojer
Benno Krojer
15 days
pretty plots sometimes
Tweet media one
Tweet media two
0
0
3
@benno_krojer
Benno Krojer
15 days
The video is online now!. 3min speed science talk on "From a soup of raw pixels to abstract meaning".
Tweet media one
@benno_krojer
Benno Krojer
30 days
Turns out condensing your research into 3min is very hard but also teaches you a lot.
0
6
39
@benno_krojer
Benno Krojer
22 days
Cool use of our AURORA work from last year to improve physical world models framed as image editing!.
@yifuqiu98
Yifu Qiu
25 days
🔁 What if you could bootstrap a world model (state1 × action → state2) using a much easier-to-train dynamics model (state1 × state2 → action) in a generalist VLM?. 💡 We show how a dynamics model can generate synthetic trajectories & serve for inference-time verification. 🧵👇
1
3
6
@benno_krojer
Benno Krojer
22 days
RT @xhluca: "Build the web for agents, not agents for the web". This position paper argues that rather than forcing web agents to adapt to….
0
54
0
@benno_krojer
Benno Krojer
22 days
A: I think there is no a formal way to define it, it's up to us humans to say what a task is really about!. In most cases, like this paper on video reasoning, the distinction is easy. But not always. Here is a cool older paper on this:.
0
0
0
@benno_krojer
Benno Krojer
22 days
As a side note, during the project i thought a lot about what makes a solution that a model finds a *shortcut* vs just a "clever solution" that humans haven't thought of?.
@benno_krojer
Benno Krojer
22 days
Excited to share the results of my internship research with @AIatMeta, as part of a larger world modeling release!. What subtle shortcuts are VideoLLMs taking on spatio-temporal questions?. And how can we instead curate shortcut-robust examples at a large-scale?. Details 👇🔬
Tweet media one
2
0
2
@benno_krojer
Benno Krojer
22 days
This is part of a larger effort at meta to significantly improve physical world modeling so check out the other works in this blog post!.
0
0
2
@benno_krojer
Benno Krojer
22 days
Some reflections at the end:.There's a lot of talk about math reasoning these days, but this project made me appreciate what simple reasoning we humans take for granted, arising in our first months and years of living. As usual i also included "Behind The Scenes" in the Appendix:
Tweet media one
2
0
7
@benno_krojer
Benno Krojer
22 days
I am super grateful to my smart+kind collaborators at Meta who made this a very enjoyable project :). @mido_assran Nicolas Ballas @koustuvsinha @candacerossio @garridoq_ Mojtaba Komeili. The Montreal office in general is a very fun place 👇
Tweet media one
1
0
3
@benno_krojer
Benno Krojer
22 days
The hardest tasks for current models are still intuitive physics tasks where performance is often below random (In line with the prev. literature). We encourage the community to use MVPBench to check if the latest VideoLLMs possess a *real* understanding of the physical world!
Tweet media one
1
0
3
@benno_krojer
Benno Krojer
22 days
On the other hand even the strongest sota models perform around random chance, with only 2-3 models significantly above random
Tweet media one
1
0
4
@benno_krojer
Benno Krojer
22 days
The questions in MVPBench are conceptually simple: relatively short videos with little linguistic or cultural knowledge needed. As a result humans have no problem with these questions, e.g. it is known that even babies do well on various intuitive physics tasks
Tweet media one
1
0
3
@benno_krojer
Benno Krojer
22 days
By automating the pairing of highly similar video pairs pairs and unifying different datasets, as well filtering out examples that models can solve with a single-frame, we end up with (probably) the largest and most diverse dataset of its kind:
Tweet media one
1
0
3
@benno_krojer
Benno Krojer
22 days
So a solution we propose a 3-step curation framework that results in the Minimal Video Pairs benchmark (MVPBench)
Tweet media one
1
0
3