ActInterp Profile Banner
Actionable Interpretability Workshop ICML2025 Profile
Actionable Interpretability Workshop ICML2025

@ActInterp

Followers
257
Following
43
Media
11
Statuses
44

🛠️ Actionable Interpretability🔎 @icmlconf 2025 | Bridging the gap between insights and actions ✨ https://t.co/4zRMTbzwDc

Joined March 2025
Don't wanna be here? Send us removal request.
@AdiSimhi
Adi Simhi
20 days
🤔What happens when LLM agents choose between achieving their goals and avoiding harm to humans in realistic management scenarios? Are LLMs pragmatic or prefer to avoid human harm? 🚀 New paper out: ManagerBench: Evaluating the Safety-Pragmatism Trade-off in Autonomous LLMs🚀🧵
1
15
35
@boknilev
Yonatan Belinkov ✈️ COLM2025
27 days
Opportunities to join my group in fall 2026: * PhD applications direct or via @ELLISforEurope ( https://t.co/NdG57c3doS) * Post-doc applications direct or via Azrieli @azrielifdn ( https://t.co/gzyYfN0z34) or Zuckerman @stem_program ( https://t.co/ZqCEbb9o4C)
7
52
330
@iatitov
Ivan Titov
3 months
Many thanks to the @ActInterp organisers for highlighting our work - and congratulations to Pedro, Alex and the other awardees! Sad not to have been there in person, it looked like a fantastic workshop. @AmsterdamNLP @EdinburghNLP
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇
0
3
28
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
1⃣Detecting High-Stakes Interactions with Activation Probes - https://t.co/oN0n7XTdke 2⃣ Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations -
1
0
0
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇
1
3
14
@ndif_team
NDIF
3 months
Great to present what’s coming next for NDIF at the @actinterp workshop at #ICML2025! If you missed us, let’s chat after the conference. Reach out here: https://t.co/NCIYb0pq5E
0
4
40
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
👇🏻
@aryaman2020
Aryaman Arora
3 months
maybe I will live tweet the actionable interp workshop panel
0
0
1
@aryaman2020
Aryaman Arora
3 months
maybe I will live tweet the actionable interp workshop panel
11
8
100
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
Starting now: our panel on actionable interpretability! @nsaphra @saprmarks @kylelostat @FazlBarez
1
3
17
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
Huge thanks to Sarah Schwettmann for a fascinating keynote on "AI Investigators for Understanding AI Systems" 🤖 @cogconfluence @TransluceAI
1
5
31
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
Grab a ☕️ and join us for a keynote by @RICEric22: Explanations for Experts via Guarantees and Domain Knowledge: From Attributions to Reasoning
0
5
14
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
➡️ Join us for the keynote by @byron_c_wallace: “What (if anything) can interpretability do for healthcare?”
2
3
13
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
The second poster session is starting now!🙌🏻
0
1
7
@_AKassem
Aly M. Kassem
3 months
Come see our poster about how to predict side effects of unlearning and Fine-Tuning at @ActInterp
1
4
25
@evzen_wy
Evžen Wybitul
3 months
Crazy amount of cool work concentrated in one room
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
The first poster session is happening now!
0
4
15
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
The first poster session is happening now!
0
2
10
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
The one and only @_beenkim on Agentic Interpretability and Neologism: What LLMs Can Offer Us!
0
5
34
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
We’ve started!👏 Looking forward to an exciting day!💫🔍⚙️
0
4
23
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
🚨The Actionable Interpretability Workshop is happening tomorrow at ICML! Join us for an exciting lineup of speakers, nearly 70 posters, and a great panel discussion 🙌 Don’t miss it! 🔍⚙️ @icmlconf @ActInterp
0
7
18
@ActInterp
Actionable Interpretability Workshop ICML2025
3 months
🚨The Actionable Interpretability Workshop is happening tomorrow at ICML! Join us for an exciting lineup of speakers, nearly 70 posters, and a great panel discussion 🙌 Don’t miss it! 🔍⚙️ @icmlconf @ActInterp
0
7
18