Michal Golovanevsky Profile
Michal Golovanevsky

@MichalGolov

Followers
47
Following
20
Media
7
Statuses
28

CS PhD student @BrownCSDept | Multimodal Learning | Mechanistic Interpretability | Clinical Deep Learning.

Providence, RI
Joined September 2022
Don't wanna be here? Send us removal request.
@MichalGolov
Michal Golovanevsky
1 month
RT @Michael_Lepori: How do VLMs balance visual information presented in-context with linguistic priors encoded in-weights? In this project,….
0
2
0
@MichalGolov
Michal Golovanevsky
1 month
RT @WilliamRudmanjr: Models rely on memorized priors early in their processing but shift toward visual evidence in mid-to-late layers. This….
0
1
0
@MichalGolov
Michal Golovanevsky
1 month
RT @WilliamRudmanjr: We create Visual CounterFact: a dataset of realistic images that contrast pixel evidence against memorized knowledge.….
0
1
0
@MichalGolov
Michal Golovanevsky
1 month
RT @WilliamRudmanjr: With PvP, we can shift 92.5% of color predictions and 74.6% of size predictions from memorized priors to counterfactua….
0
1
0
@MichalGolov
Michal Golovanevsky
1 month
RT @WilliamRudmanjr: When vision-language models answer questions, are they truly analyzing the image or relying on memorized facts? We int….
0
4
0
@MichalGolov
Michal Golovanevsky
1 month
RT @CSVisionPapers: Pixels Versus Priors: Controlling Knowledge Priors in Vision-Language Models through Visual Counterfacts. https://t.co/….
0
1
0
@MichalGolov
Michal Golovanevsky
4 months
Code: paper: huggingface dataset: [6/6].
1
0
2
@MichalGolov
Michal Golovanevsky
4 months
Our findings serve as a call to action for the MLLM research community. Despite training on specialized datasets, the concept of “sides” has not emerged in MLLMs. [5/6]
Tweet media one
1
0
1
@MichalGolov
Michal Golovanevsky
4 months
We take a step forward with Visually-Cued Chain-of-Thought prompting. While annotations alone do not enhance visual reasoning, combining them with CoT prompting boosts GPT-4o's side counting accuracy on novel shapes from 7% to 93% and improves MathVerse performance by ~7%. [4/6]
Tweet media one
1
0
1
@MichalGolov
Michal Golovanevsky
4 months
In contrast, we find that vision encoders are shape-blind, mapping distinct shapes to the same region in embedding space. As a result, MLLMs struggle to identify and count the sides of pentagons, heptagons, and octagons. [3/6]
Tweet media one
1
0
1
@MichalGolov
Michal Golovanevsky
4 months
Where does the failure of MLLMs occur? We show that the underlying LLMs answer with 100% accuracy on geometric property questions. Ex Q: “How many sides does a heptagon have?” with A: “7”. [2/6].
1
0
1
@MichalGolov
Michal Golovanevsky
4 months
If SOTA models fail to recognize simple shapes, should we be evaluating them on complex geometric tasks? Most MLLMs struggle with counting the number of sides of regular polygons and all MLLMs receive 0% on novel shapes. @WilliamRudmanjr.@_amirbar @vedantpalit1008 [1/6]
Tweet media one
1
6
12
@MichalGolov
Michal Golovanevsky
9 months
RT @WilliamRudmanjr: NOTICE uses Symmetric Token Replacement for text corruption and Semantic Image Pairs (SIP) for image corruption. SIP r….
0
1
0
@MichalGolov
Michal Golovanevsky
9 months
RT @WilliamRudmanjr: We extend the generalizability of NOTICE by using Stable-Diffusion to generate semantic image pairs and find results a….
0
1
0
@MichalGolov
Michal Golovanevsky
9 months
RT @WilliamRudmanjr: The finding that important attention heads implement one of a small set of interpretable functions boosts transparency….
0
2
0
@MichalGolov
Michal Golovanevsky
9 months
RT @WilliamRudmanjr: How do VLMs like BLIP and LLaVA differ in how they process visual information? Using our mech-interp pipeline for VLMs….
0
2
0
@MichalGolov
Michal Golovanevsky
9 months
RT @WilliamRudmanjr: Instead, LLaVA relies on self-attention heads to manage “outlier” attention patterns in the image, focusing on regulat….
0
1
0
@MichalGolov
Michal Golovanevsky
1 year
RT @WilliamRudmanjr: The finding that important cross-attention heads implement one of a small set of interpretable functions helps boost V….
0
1
0
@MichalGolov
Michal Golovanevsky
1 year
RT @WilliamRudmanjr: By visualizing cross-attention patterns, we've discovered that these universal heads fall into three functional catego….
0
1
0