
Constantin Venhoff
@cvenhoff00
Followers
229
Following
56
Media
9
Statuses
26
PhD Student at Oxford University @OxfordTVG | Intern @Meta
Joined April 2024
RT @nickhjiang: What makes LLMs like Grok-4 unique?. We use sparse autoencoders (SAEs) to tackle queries like these and apply them to four….
0
16
0
RT @HCasademunt: Problem: Train LLM on insecure code → it becomes broadly misaligned.Solution: Add safety data? What if you can't?. Use int….
0
27
0
RT @_jake_ward: Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors f….
0
27
0
RT @balesni: A simple AGI safety technique: AI’s thoughts are in plain English, just read them. We know it works, with OK (not perfect) tra….
0
109
0
RT @emmons_scott: Is CoT monitoring a lost cause due to unfaithfulness? 🤔. We say no. The key is the complexity of the bad behavior. When w….
0
40
0
Code & Datasets: Demo Notebook: Paper: Work done with my awesome collaborators @IvanArcus @ArthurConmy @NeelNanda5 @philiptorr as part of the @MATSprogram.
arxiv.org
Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these...
1
2
20
Can we actually control reasoning behaviors in thinking LLMs?. Our @iclr_conf workshop paper is out! 🎉. We show how to steer DeepSeek-R1-Distill’s reasoning: make it backtrack, add knowledge, test examples. Just by adding steering vectors to its activations!.Details in 🧵👇
4
27
168
Huge thanks to collaborators @NeelNanda5 @soniajoseph_ @philiptorr @ashk__on for making this happen!. Paper: Code & weights:
github.com
Contribute to cvenhoff/vlm-mapping development by creating an account on GitHub.
0
0
19
RT @MarroSamuele: LLMs are continuous models, but language is discrete. What happens when a continuous model approximates a discrete sequen….
0
9
0