Ido Cohen @IdoC0hen X Profile

Ido Cohen

@IdoC0hen

Followers

23

Following

5

Media

1

Statuses

9

Joined March 2020

Don't wanna be here? Send us removal request.

Ido Cohen

@IdoC0hen

2 months

Huge thanks to my co-authors @dhgottesman @megamor2 and @RGiryes! If you'll be at #acl2025 , I'd love to connect and chat! Read the full paper here: https://t.co/w1rRELq0aa Explore PopVQA here: https://t.co/4gTHqIcaVI #acl2025 #NLP #MachineLearning

huggingface.co

0

1

6

Ido Cohen

@IdoC0hen

2 months

This late processing creates a bottleneck. By the time the model figures out what it's seeing, there are very few layers left for reasoning about it.

1

6

Ido Cohen

@IdoC0hen

2 months

Our experiments reveal that VLMs use most of their processing power just for Hop 1. We found that critical image information is processed very late—in the model's middle layers.

1

5

Ido Cohen

@IdoC0hen

2 months

So why the gap? We found that reasoning about a visual entity behaves like a multi-hop problem: Hop 1: Identify the entity in the image. Hop 2: Connect the recognized entity to its stored factual knowledge and extract it.

1

6

Ido Cohen

@IdoC0hen

2 months

What makes PopVQA special? It’s designed to separate the task of identifying an entity from the task of reasoning about it, by providing the identity of the entity in the image instead of just the answers to questions, allowing to filter out unrecognized entities.

1

0

7

Ido Cohen

@IdoC0hen

2 months

To investigate this, we built and released a new dataset: PopVQA. It contains over 15,000 popular entities, from celebrities and landmarks to paintings and brands, each with a set of factual questions.

1

0

6

Ido Cohen

@IdoC0hen

2 months

We discovered that when you show VLMs an entity in a picture instead of just writing its name, their accuracy on factual questions drops by up to 18% for some models!

1

0

6

Ido Cohen

@IdoC0hen

2 months

A Vision-Language Model can answer questions about Robin Williams. It can also recognize him in a photo. So why does it FAIL when asked the same questions using his photo instead of his name? A thread on our new #acl2025 paper that explores this puzzle 🧵

1

7

25

Ido Cohen

@IdoC0hen

3 years

Was a pleasure walking down memory lane with the team on this research! Very excited to see what theoretical and practical developments will stem from this.

Adi Haviv

@adihaviv

3 years

The cat is out of the bag🥁 LMs memorized predictions are a two-step process, and we used idioms to find that out. New dataset for probing memorization, analysis methodology, and much more. @IdoCohe49871127 @GidronJacob @RoeiSchuster @yoavgo @megamor2 https://t.co/pUzgr8Ut2q 🧵

0

1

2