
Patrick Da Silva
@patrickqdasilva
Followers
36
Following
1
Media
7
Statuses
14
Incoming PhD Student at The Ohio State University
Joined September 2023
Super grateful to have received senior area chair highlight at #ACL2025NLP.⏳ The generalization of interpretability-based steering methods is at an inflection point.🚂 As a community, we need to place strong emphasis on methods-reliability evals if we care about long-term impact.
4
3
32
🌟Excited to announce that “Steering off Course” was accepted to #ACL2025NLP for an Oral and Panel Discussion! 📍Wed, 9AM, Level 2 Hall A. 🇨🇦 I will also share this work at Actionable Interpretability @ActInterp at #ICML2025.📍Sat, 1PM, East Ballroom A.
arxiv.org
Steering methods for language models (LMs) have gained traction as lightweight alternatives to fine-tuning, enabling targeted modifications to model activations. However, prior studies primarily...
Steering language models by directly intervening on internal activations is appealing–but does it generalize?. We study 3 popular steering methods with 36 models from 14 families (1.5-70B), exposing brittle performance and fundamental flaws in underlying assumptions.🧵👇.(1/10)
0
1
3
RT @HananeNMoussa: 📢Excited to announce the first project of my PhD!. Through our work we address the training data scarcity to develop AI….
0
6
0
We hope to inspire further research into internal transformer mechanism variance, so future steering methods can be robust and adaptable to new releases 🥕🐇💨. Special shoutout to my advisor @shocheen and collaborators @HariSethuraman @dheerajgopal @HannaHajishirzi.(10/10).
0
0
2