Constantin Venhoff @cvenhoff00 X Profile

Constantin Venhoff

@cvenhoff00

Followers

229

Following

56

Media

9

Statuses

26

PhD Student at Oxford University @OxfordTVG | Intern @Meta

Joined April 2024

Don't wanna be here? Send us removal request.

Constantin Venhoff

@cvenhoff00

12 days

RT @nickhjiang: What makes LLMs like Grok-4 unique?. We use sparse autoencoders (SAEs) to tackle queries like these and apply them to four….

0

16

0

Constantin Venhoff

@cvenhoff00

1 month

RT @HCasademunt: Problem: Train LLM on insecure code → it becomes broadly misaligned.Solution: Add safety data? What if you can't?. Use int….

0

27

0

Grok

@grok

10 hours

Join millions who have switched to Grok.

50

80

636

Constantin Venhoff

@cvenhoff00

1 month

RT @_jake_ward: Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors f….

0

27

0

Constantin Venhoff

@cvenhoff00

1 month

RT @balesni: A simple AGI safety technique: AI’s thoughts are in plain English, just read them. We know it works, with OK (not perfect) tra….

0

109

0

Constantin Venhoff

@cvenhoff00

2 months

RT @emmons_scott: Is CoT monitoring a lost cause due to unfaithfulness? 🤔. We say no. The key is the complexity of the bad behavior. When w….

0

40

0

Constantin Venhoff

@cvenhoff00

2 months

Code & Datasets: Demo Notebook: Paper: Work done with my awesome collaborators @IvanArcus @ArthurConmy @NeelNanda5 @philiptorr as part of the @MATSprogram.

arxiv.org

Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these...

1

2

20

Constantin Venhoff

@cvenhoff00

2 months

But wait, there's more! Our follow-up work (coming soon) uses SAEs + clustering to build a principled and complete taxonomy of reasoning behaviors and provides new insights into the difference between reasoning- and base models. Stay tuned!.

1

14

Constantin Venhoff

@cvenhoff00

2 months

Where do these behaviors live?. We use attribution patching to find in which layer of the model steering vectors are most effective, and find that there are clear peaks in the middle layers.

1

0

8

Constantin Venhoff

@cvenhoff00

2 months

How do we find these vectors?. First, annotate reasoning chains with GPT-4o, then average the internal activations for each behavior across its annotated text positions. Lastly, subtract the overall mean; and that’s your steering vector. Simple and surprisingly effective.

2

0

8

Constantin Venhoff

@cvenhoff00

2 months

We tested this across 500 tasks in 10 categories using three DeepSeek-R1-Distill models (Qwen-14B, Qwen-1.5B & Llama-8B). The steering effects are consistent and robust; you can literally dial up or down specific reasoning behaviors at inference time 🚀.

1

0

8

Constantin Venhoff

@cvenhoff00

2 months

The key insight: reasoning behaviors like backtracking and knowledge addition live in linear directions in the model's activation space. Want more backtracking? Add the vector. Less? Subtract it. Simple, interpretable, and it works!

1

13

Constantin Venhoff

@cvenhoff00

2 months

Can we actually control reasoning behaviors in thinking LLMs?. Our @iclr_conf workshop paper is out! 🎉. We show how to steer DeepSeek-R1-Distill’s reasoning: make it backtrack, add knowledge, test examples. Just by adding steering vectors to its activations!.Details in 🧵👇

4

27

168

Constantin Venhoff

@cvenhoff00

2 months

Huge thanks to collaborators @NeelNanda5 @soniajoseph_ @philiptorr @ashk__on for making this happen!. Paper: Code & weights:

github.com

Contribute to cvenhoff/vlm-mapping development by creating an account on GitHub.

0

19

Constantin Venhoff

@cvenhoff00

2 months

💡 Summary: Visual representations align with language features in middle-to-late layers, not early ones. This raises key questions: How important is LLM feature "initialization" for VLM performance? Are current adapter architectures optimal for cross-modal learning?. (6/6).

3

0

15

Constantin Venhoff

@cvenhoff00

2 months

📊 Key finding: Reconstruction error drops, sparsity decreases, AND semantic alignment between SAE features and visual content sharply increases around layer 18/26! .Visual-linguistic alignment happens in middle-to-late layers, not early ones. (5/6)

1

0

16

Constantin Venhoff

@cvenhoff00

2 months

🔍 Using pre-trained sparse autoencoders as analytical probes, we tracked how visual representations flow through the language model layers. We measured SAE reconstruction error and sparsity patterns to detect when visual features align with language representations. (4/6)

1

0

11

Constantin Venhoff

@cvenhoff00

2 months

✅ Despite this constraint, our VLM (CLIP-Gemma-2-2b-it) performs on par with LLaVA-v1, showing that the LLM's feature space already contains good representations for visual instruction tuning - the adapter just needs to align them effectively. (3/6)

2

18

Constantin Venhoff

@cvenhoff00

2 months

🔬 Our approach: We deliberately froze both the vision transformer (CLIP ViT) AND the language model (Gemma-2-2B-it), training only a linear adapter between them. This forced the adapter to map visual features directly into the LLM's existing feature space. (2/6)

3

0

17

Constantin Venhoff

@cvenhoff00

2 months

🔍 New paper: How do vision-language models actually align visual- and language representations?. We used sparse autoencoders to peek inside VLMs and found something surprising about when and where cross-modal alignment happens!. Presented at XAI4CV Workshop @ CVPR. 🧵 (1/6)

9

44

302

Constantin Venhoff

@cvenhoff00

5 months

RT @MarroSamuele: LLMs are continuous models, but language is discrete. What happens when a continuous model approximates a discrete sequen….

0

9

0