
Tal Haklay
@tal_haklay
Followers
596
Following
836
Media
24
Statuses
151
NLP | Interpretability | PhD student at the @TechnionLive
Joined March 2022
1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!.We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
7
43
292
RT @tomerashuach: 🚨 New preprint out!. CRISP: Persistent Concept Unlearning via SAEs.LLMs often encode knowledge we want to remove. CRISP….
0
19
0
RT @dana_arad4: Excited to spend the rest of the summer visiting @davidbau's lab at Northeastern! If you’re in the area and want to chat ab….
0
4
0
RT @Itay_itzhak_: At #ACL2025 and not sure what to do next? GEM 💎² is the place to be for awesome talks on the future of LLM evaluation. Co….
0
4
0
Had my oral presentation at ACL @aclmeeting today!.Big thanks to my collaborators, advisor, parents, and partner - and a special thanks to the “Goodbye Stress” gummies I picked up at the supermarket. Couldn’t have done it without any of you 🙈
1
2
54
RT @tomerashuach: 🎉 Presenting my poster today at #ACL2025 !. REVS: Unlearning Sensitive Info in LMs via Rank Editing.Come by to chat about….
technion-cs-nlp.github.io
REVS surgically removes a language model's tendency to generate a given sensitive information from its training data, while preserving its broader knowledge.
0
8
0
RT @iatitov: Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardee….
0
3
0
RT @ActInterp: Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 .and thanks….
0
3
0
ICML🛫🛬ACL. Next week I’ll be at @aclmeeting, giving an oral presentation about position-aware automatic circuit discovery. DM me if you’d like to chat about interpretability, mech-interp at scale, or just life :)
0
4
39
RT @ActInterp: 🚨The Actionable Interpretability Workshop is happening tomorrow at ICML! .Join us for an exciting lineup of speakers, nearl….
0
7
0
RT @OrgadHadas: Hope everyone’s getting the most out of #icml25. We’re excited and ready for the Actionable Interpretability (@ActInterp) w….
0
5
0
RT @Itay_itzhak_: 🚨New paper alert🚨. 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretrainin….
0
24
0
RT @saprmarks: In a new post, I present:.1. A framework for thinking about which downstream applications interpretability researchers shoul….
0
6
0
RT @saprmarks: I'm excited to discuss downstream applications of interpretability at @ActInterp! For a preview of my thoughts on the topic,….
0
7
0
RT @OrgadHadas: Started packing for #ICML2025? We're already excited for the @ActInterp workshop! Only 8 days away. Confirmed keynotes: @_….
0
7
0
RT @FazlBarez: I’ll be at #ICML2025 – come say hi and talk to me about responsible AI👋. 🎤 Speaking (14th): Post-AGI Civilizational Equilibr….
0
4
0