Daniel Filan Profile
Daniel Filan

@dfrsrchtwts

Followers
2K
Following
846
Media
101
Statuses
1K

Research manager at MATS. Want to usher in an era of human-friendly superintelligence, don't know how. Podcast: https://t.co/gM752yhlTd

Joined June 2020
Don't wanna be here? Send us removal request.
@dfrsrchtwts
Daniel Filan
1 day
RT @asteriskmgzn: Asterisk is launching an AI blogging fellowship!. We're looking for people with unique perspectives on AI who want to tak….
0
34
0
@dfrsrchtwts
Daniel Filan
6 days
RT @adamimos: How do transformers carry out recurrent computations while being fundamentally feedforward? Excited to present our work on Co….
0
13
0
@dfrsrchtwts
Daniel Filan
9 days
active inference baby.
@deepfates
superfates
9 days
ChatGPT kept telling people you could could import tabs from ascii at their website so they just built a feature to do that
Tweet media one
Tweet media two
0
0
6
@dfrsrchtwts
Daniel Filan
10 days
RT @jessi_cata: There has been much criticism of the AI 2027 model. As a check, I ran a Monte Carlo model based on METR data (2032 median).….
0
5
0
@dfrsrchtwts
Daniel Filan
11 days
RT @AXRPodcast: My apologies: if you downloaded my most recent episode, my audio cut out around 0:57:40. The issue should be fixed if you r….
0
1
0
@dfrsrchtwts
Daniel Filan
11 days
0
0
2
@dfrsrchtwts
Daniel Filan
11 days
New episode with @SamuelAlbanie, where we discuss the recent Google DeepMind paper "An Approach to Technical AGI Safety and Security"! Link to watch below.
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
11 days
Episode 45 - Samuel Albanie on DeepMind's AGI Safety Approach.
1
4
26
@dfrsrchtwts
Daniel Filan
16 days
Ways you could do this:.- do a bunch of safety testing on un-RL'd model, if it's basically OK, deploy RL'd model.- mostly deploy RL'd model, but 0.1% of the time use the un-RL'd model instead, halt and catch fire if you see anything too bad in that 0.1%.
1
0
4
@dfrsrchtwts
Daniel Filan
16 days
This is probably a dumb question, but what's the problem with doing RL on CoT for safety? People are like "oh it makes the CoT less interpretable so you can't notice problems" but can't you keep the un-RL'd model around and see if it has issues also?.
5
1
12
@dfrsrchtwts
Daniel Filan
20 days
0
0
2
@dfrsrchtwts
Daniel Filan
20 days
New episode with @petersalib! We chat about how giving AIs rights might make them less inclined to take over, and also why legal academia is so weird. Link to watch in reply.
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
20 days
Episode 44 - Peter Salib on AI Rights for Human Safety.
1
2
18
@dfrsrchtwts
Daniel Filan
21 days
RT @austinc3301: 🚀 We're launching mentor applications for SPAR's Fall 2025 round!. @SPARexec is a part-time, remote research program where….
0
2
0
@dfrsrchtwts
Daniel Filan
21 days
RT @Jsevillamol: A couple of weeks ago I posted a summary of Epoch's mission, clearing up some common misunderstanding of what we are tryin….
0
5
0
@dfrsrchtwts
Daniel Filan
24 days
See however
0
0
2
@dfrsrchtwts
Daniel Filan
25 days
RT @OwainEvans_UK: Some recent talks/interviews:.Podcast on introspection, self-awareness and emergent misalignment .
0
14
0
@dfrsrchtwts
Daniel Filan
25 days
RT @binarybits: Something that comes through clearly in the DeepSeek R1 research paper, and I wish was more broadly understood, is that the….
0
18
0
@dfrsrchtwts
Daniel Filan
1 month
0
0
1
@dfrsrchtwts
Daniel Filan
1 month
New episode with @davlindner, covering his work on MONA! Check it out - video link in reply.
Tweet media one
@AXRPodcast
AXRP - the AI X-risk Research Podcast
1 month
Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval.
1
4
30
@dfrsrchtwts
Daniel Filan
1 month
RT @MariusHobbhahn: LLMs Often Know When They Are Being Evaluated!. We investigate frontier LLMs across 1000 datapoints from 61 distinct da….
0
81
0
@dfrsrchtwts
Daniel Filan
1 month
@OwainEvans_UK Link to watch:
0
0
3