
Ouail Kitouni
@WKitouni
Followers
65
Following
56
Media
16
Statuses
110
Member of technical staff @Anthropic prev @MIT @Meta @MSFTResearch
San Francisco, CA
Joined August 2019
RT @alexalbert__: Friday feature drop:. Highlight text or code within an Artifact and quickly have Claude improve or explain the selection.….
0
67
0
RT @teortaxesTex: Thesis from @ilyasut : "to predict the next word, you have to predict the world".Antithesis from @ylecun : "AR-LLMs suck!….
0
21
0
RT @summeryue0: 🚀 Introducing the SEAL Leaderboards! We rank LLMs using private datasets that can’t be gamed. Vetted experts handle the rat….
0
34
0
I think we'll see more such results as we confront a fundamental alignment issue: There's an irreducible tradeoff btwn helpfulness & harmlessness. A good model provides some harmful content for the greater good, while a terrible model is constrained, upholding unnecessary rules.
New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through.
1
0
2