Daniel Filan Profile
Daniel Filan

@dfrsrchtwts

Followers
2K
Following
842
Media
101
Statuses
1K

Research manager at MATS. Want to usher in an era of human-friendly superintelligence, don't know how. Podcast: https://t.co/gM752yhlTd

Joined June 2020
Don't wanna be here? Send us removal request.
@dfrsrchtwts
Daniel Filan
1 day
active inference baby.
@deepfates
deepfates
1 day
ChatGPT kept telling people you could could import tabs from ascii at their website so they just built a feature to do that
Tweet media one
Tweet media two
0
0
6
@dfrsrchtwts
Daniel Filan
2 days
RT @jessi_cata: There has been much criticism of the AI 2027 model. As a check, I ran a Monte Carlo model based on METR data (2032 median).….
0
5
0
@dfrsrchtwts
Daniel Filan
3 days
RT @AXRPodcast: My apologies: if you downloaded my most recent episode, my audio cut out around 0:57:40. The issue should be fixed if you r….
0
1
0
@dfrsrchtwts
Daniel Filan
3 days
0
0
2
@dfrsrchtwts
Daniel Filan
3 days
New episode with @SamuelAlbanie, where we discuss the recent Google DeepMind paper "An Approach to Technical AGI Safety and Security"! Link to watch below.
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
3 days
Episode 45 - Samuel Albanie on DeepMind's AGI Safety Approach.
1
4
26
@dfrsrchtwts
Daniel Filan
8 days
Ways you could do this:.- do a bunch of safety testing on un-RL'd model, if it's basically OK, deploy RL'd model.- mostly deploy RL'd model, but 0.1% of the time use the un-RL'd model instead, halt and catch fire if you see anything too bad in that 0.1%.
1
0
4
@dfrsrchtwts
Daniel Filan
8 days
This is probably a dumb question, but what's the problem with doing RL on CoT for safety? People are like "oh it makes the CoT less interpretable so you can't notice problems" but can't you keep the un-RL'd model around and see if it has issues also?.
5
1
12
@dfrsrchtwts
Daniel Filan
12 days
0
0
2
@dfrsrchtwts
Daniel Filan
12 days
New episode with @petersalib! We chat about how giving AIs rights might make them less inclined to take over, and also why legal academia is so weird. Link to watch in reply.
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
12 days
Episode 44 - Peter Salib on AI Rights for Human Safety.
1
2
18
@dfrsrchtwts
Daniel Filan
13 days
RT @austinc3301: 🚀 We're launching mentor applications for SPAR's Fall 2025 round!. @SPARexec is a part-time, remote research program where….
0
2
0
@dfrsrchtwts
Daniel Filan
13 days
RT @Jsevillamol: A couple of weeks ago I posted a summary of Epoch's mission, clearing up some common misunderstanding of what we are tryin….
0
5
0
@dfrsrchtwts
Daniel Filan
16 days
See however
0
0
2
@dfrsrchtwts
Daniel Filan
17 days
RT @OwainEvans_UK: Some recent talks/interviews:.Podcast on introspection, self-awareness and emergent misalignment .
0
10
0
@dfrsrchtwts
Daniel Filan
17 days
RT @binarybits: Something that comes through clearly in the DeepSeek R1 research paper, and I wish was more broadly understood, is that the….
0
18
0
@dfrsrchtwts
Daniel Filan
25 days
0
0
1
@dfrsrchtwts
Daniel Filan
25 days
New episode with @davlindner, covering his work on MONA! Check it out - video link in reply.
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
25 days
Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval.
1
4
30
@dfrsrchtwts
Daniel Filan
28 days
RT @MariusHobbhahn: LLMs Often Know When They Are Being Evaluated!. We investigate frontier LLMs across 1000 datapoints from 61 distinct da….
0
81
0
@dfrsrchtwts
Daniel Filan
1 month
@OwainEvans_UK Link to watch:
0
0
3
@dfrsrchtwts
Daniel Filan
1 month
New episode with @OwainEvans_UK! Covers work on emergent misalignment and more!
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
1 month
Episode 42 - Owain Evans on LLM Psychology.
2
9
73
@dfrsrchtwts
Daniel Filan
1 month
RT @safe_paper: Large Language Models Often Know When They Are Being Evaluated.Joe Needham, Giles Edkins (@gdedkins), Govind Pimpale (@Govi….
0
23
0