Daniel Filan @dfrsrchtwts X Profile

Daniel Filan

@dfrsrchtwts

Followers

2K

Following

846

Media

101

Statuses

1K

Research manager at MATS. Want to usher in an era of human-friendly superintelligence, don't know how. Podcast: https://t.co/gM752yhlTd

Joined June 2020

Don't wanna be here? Send us removal request.

Daniel Filan

@dfrsrchtwts

1 day

RT @asteriskmgzn: Asterisk is launching an AI blogging fellowship!. We're looking for people with unique perspectives on AI who want to tak….

0

34

0

Daniel Filan

@dfrsrchtwts

6 days

RT @adamimos: How do transformers carry out recurrent computations while being fundamentally feedforward? Excited to present our work on Co….

0

13

0

Daniel Filan

@dfrsrchtwts

9 days

active inference baby.

superfates

@deepfates

9 days

ChatGPT kept telling people you could could import tabs from ascii at their website so they just built a feature to do that

0

6

Daniel Filan

@dfrsrchtwts

10 days

RT @jessi_cata: There has been much criticism of the AI 2027 model. As a check, I ran a Monte Carlo model based on METR data (2032 median).….

0

5

0

Daniel Filan

@dfrsrchtwts

11 days

RT @AXRPodcast: My apologies: if you downloaded my most recent episode, my audio cut out around 0:57:40. The issue should be fixed if you r….

0

1

0

Daniel Filan

@dfrsrchtwts

11 days

0

2

Daniel Filan

@dfrsrchtwts

11 days

New episode with @SamuelAlbanie, where we discuss the recent Google DeepMind paper "An Approach to Technical AGI Safety and Security"! Link to watch below.

AXRP - the AI X-risk Research Podcast

@AXRPodcast

11 days

Episode 45 - Samuel Albanie on DeepMind's AGI Safety Approach.

1

4

26

Daniel Filan

@dfrsrchtwts

16 days

Ways you could do this:.- do a bunch of safety testing on un-RL'd model, if it's basically OK, deploy RL'd model.- mostly deploy RL'd model, but 0.1% of the time use the un-RL'd model instead, halt and catch fire if you see anything too bad in that 0.1%.

1

0

4

Daniel Filan

@dfrsrchtwts

16 days

This is probably a dumb question, but what's the problem with doing RL on CoT for safety? People are like "oh it makes the CoT less interpretable so you can't notice problems" but can't you keep the un-RL'd model around and see if it has issues also?.

5

1

12

Daniel Filan

@dfrsrchtwts

20 days

0

2

Daniel Filan

@dfrsrchtwts

20 days

New episode with @petersalib! We chat about how giving AIs rights might make them less inclined to take over, and also why legal academia is so weird. Link to watch in reply.

AXRP - the AI X-risk Research Podcast

@AXRPodcast

20 days

Episode 44 - Peter Salib on AI Rights for Human Safety.

1

2

18

Daniel Filan

@dfrsrchtwts

21 days

RT @austinc3301: 🚀 We're launching mentor applications for SPAR's Fall 2025 round!. @SPARexec is a part-time, remote research program where….

0

2

0

Daniel Filan

@dfrsrchtwts

21 days

RT @Jsevillamol: A couple of weeks ago I posted a summary of Epoch's mission, clearing up some common misunderstanding of what we are tryin….

0

5

0

Daniel Filan

@dfrsrchtwts

24 days

See however

0

2

Daniel Filan

@dfrsrchtwts

25 days

RT @OwainEvans_UK: Some recent talks/interviews:.Podcast on introspection, self-awareness and emergent misalignment .

0

14

0

Daniel Filan

@dfrsrchtwts

25 days

RT @binarybits: Something that comes through clearly in the DeepSeek R1 research paper, and I wish was more broadly understood, is that the….

0

18

0

Daniel Filan

@dfrsrchtwts

1 month

0

1

Daniel Filan

@dfrsrchtwts

1 month

New episode with @davlindner, covering his work on MONA! Check it out - video link in reply.

AXRP - the AI X-risk Research Podcast

@AXRPodcast

1 month

Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval.

1

4

30

Daniel Filan

@dfrsrchtwts

1 month

RT @MariusHobbhahn: LLMs Often Know When They Are Being Evaluated!. We investigate frontier LLMs across 1000 datapoints from 61 distinct da….

0

81

0

Daniel Filan

@dfrsrchtwts

1 month

@OwainEvans_UK Link to watch:

0

3