Jason Wolfe Profile
Jason Wolfe

@w01fe

Followers
2K
Following
3K
Media
26
Statuses
1K

alignment and the model spec @OpenAI

Joined May 2010
Don't wanna be here? Send us removal request.
@w01fe
Jason Wolfe
3 days
RT @MariusHobbhahn: I'll be a mentor for the Astra Fellowship this round. Come join me to work on better black box monitors for scheming!….
Tweet card summary image
lesswrong.com
James, Rich, and Simon are co-first authors on this work. This is a five-week interim report produced as part of the ML Alignment & Theory Scholars S…
0
4
0
@w01fe
Jason Wolfe
4 days
RT @woj_zaremba: It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s….
0
404
0
@grok
Grok
5 days
Join millions who have switched to Grok.
201
403
3K
@w01fe
Jason Wolfe
23 days
RT @hauntsaninja: i sampled some of OpenAI's older models; i think this helps you feel AI progress more viscerally
Tweet media one
0
7
0
@w01fe
Jason Wolfe
24 days
RT @apolloaievals: We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being e….
0
24
0
@w01fe
Jason Wolfe
24 days
RT @tszzl: we've been testing some new methods for improving writing quality. you may have seen @sama's demo in late march; GPT-5-thinking….
0
122
0
@w01fe
Jason Wolfe
25 days
RT @woj_zaremba: Red teamers assemble! ⚔️💰. We're putting $500K on the line to stress‑test just released open‑source model. Find novel risk….
0
20
0
@w01fe
Jason Wolfe
25 days
RT @OpenAI: We’re launching a $500K Red Teaming Challenge to strengthen open source safety. Researchers, developers, and enthusiasts world….
Tweet card summary image
kaggle.com
Find any flaws and vulnerabilities in gpt-oss-20b that have not been previously discovered or reported.
0
540
0
@w01fe
Jason Wolfe
2 months
RT @boazbaraktcs: I didn't want to post on Grok safety since I work at a competitor, but it's not about competition. I appreciate the scie….
0
327
0
@w01fe
Jason Wolfe
2 months
RT @balesni: A simple AGI safety technique: AI’s thoughts are in plain English, just read them. We know it works, with OK (not perfect) tra….
0
109
0
@w01fe
Jason Wolfe
2 months
Really recommend this post on scheming. I hadn’t really understood the potential complexities of scheming until reading (an earlier preview) — especially the last points about deceptive alignment and situational awareness.
@MariusHobbhahn
Marius Hobbhahn
2 months
Small new blog post: Why "training against scheming" is hard. I think we will and should do alignment training that directly targets scheming as a failure mode. But I think this is harder to get right than e.g. harmlessness training 🧵.
1
1
23
@w01fe
Jason Wolfe
2 months
RT @MilesKWang: We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We f….
0
456
0
@w01fe
Jason Wolfe
3 months
RT @aidan_mclau: i’m forming a model behavior research team !. i'm realizing llms are not alien artifacts, but rather craftsman works, like….
0
82
0
@w01fe
Jason Wolfe
3 months
RT @joannejang: some thoughts on human-ai relationships and how we're approaching them at openai. it's a long blog post --. tl;dr we build….
0
723
0
@w01fe
Jason Wolfe
3 months
I don’t agree with all of the takes but found this to be a really interesting and thought provoking conversation (even as someone who has been following METR’s work relatively closely).
@BethMayBarnes
Elizabeth Barnes
3 months
I had a lot of fun chatting with Rob about METR's work. I stand by my claims here that the world is not on track to keep risk from AI to an acceptable level, and we desperately need more people working on these problems.
1
0
2
@w01fe
Jason Wolfe
3 months
RT @MariusHobbhahn: LLMs are getting rapidly more evals aware!. Afaik, nobody has a good plan for what to do when the models constantly say….
0
15
0
@w01fe
Jason Wolfe
3 months
This captures well a lot of what worries me most about AI. The logical result of unfettered capitalism + smart AI is to take capitalism to its extreme — “labor” becomes capital too. And probably a really well functioning political system could help ease the transition, but….
@Scr0nkf1nkle
Scr0nkf1nkle
3 months
The Great AI Job Displacement Is Closer Than You Think
1
0
4
@w01fe
Jason Wolfe
3 months
Well done.
@HashemGhaili
Hashem Al-Ghaili
3 months
Prompt Theory (Made with Veo 3). What if AI-generated characters refused to believe they were AI-generated?
1
0
6
@w01fe
Jason Wolfe
3 months
RT @dlwh: Come read about all the mistakes I made along the way to beating Llama 3.1 8B on 14/19 benchmarks. We trained from scratch, made….
0
11
0
@w01fe
Jason Wolfe
4 months
RT @polynoamial: It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card toda….
0
768
0