Patrick Chao @patrickrchao X Profile

Patrick Chao

@patrickrchao

Followers

942

Following

417

Media

14

Statuses

53

research @openai, making models smarter and safer

https://t.co/c338DWhzcP

Joined January 2014

Don't wanna be here? Send us removal request.

Patrick Chao

@patrickrchao

9 months

if you think gpt-4.5 is big wait til you see gpt-4.11

OpenAI

@OpenAI

9 months

GPT-4.5 has entered the Chat. https://t.co/tBzJxSyCeY

101

124

5K

Patrick Chao

@patrickrchao

2 months

This is one of the craziest graphs I've ever seen! AI Models went from dragging humans down (gpt-4o) → to breaking past the human baseline gpt-5 delivers ~1.6× efficiency in both speed and cost 📈

OpenAI

@OpenAI

2 months

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks. Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most. https://t.co/uKPPDldVNS

0

1

7

Sebastien Bubeck

@SebastienBubeck

3 months

My team forced me to post this. It's definitely the last time I organize a nice offsite for them 😤😂

22

3

203

Patrick Chao

@patrickrchao

4 months

huge congrats to @_chris_lu_ @minyoung_huh and @SuvanshSanjeev for putting a truly herculean effort into training gpt-5 mini and nano!

Suvansh Sanjeev

@SuvanshSanjeev

4 months

GPT-5 is what you’ve been waiting for – it defines and extends the cost-intelligence frontier across model sizes today. it’s been a long journey, and we’ve landed pivotal improvements across many axes in the whole GPT-5 family. and hey no more model picker (by default)!

1

14

Alexander Wei

@alexwei_

5 months

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

405

1K

7K

Alex Robey

@AlexRobey23

1 year

After rejections at ICLR, ICML, and NeurIPS, I'm happy to report that "Jailbreaking Black Box LLMs in Twenty Queries" (i.e., the PAIR paper) has been accepted at @satml_conf! 🚀 A quick 🧵 summarizing some thoughts a year on from PAIR's release.

6

19

175

Daniel Geng

@dangengdg

1 year

What happens when you train a video generation model to be conditioned on motion? Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Here’s a few examples – check out this thread 🧵 for more results!

20

145

672

Mira Murati

@miramurati

1 year

All Plus and Team users in ChatGPT

OpenAI

@OpenAI

1 year

Advanced Voice is rolling out to all Plus and Team users in the ChatGPT app over the course of the week. While you’ve been patiently waiting, we’ve added Custom Instructions, Memory, five new voices, and improved accents. It can also say “Sorry I’m late” in over 50 languages.

207

219

4K

Suvansh Sanjeev

@SuvanshSanjeev

1 year

hi o1

1

2

9

Patrick Chao

@patrickrchao

1 year

models are kinda good check out our system card for some cool evals and plots!

openai.com

This report outlines the safety work carried out prior to releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk evaluations according to our Preparedness Framework.

OpenAI

@OpenAI

1 year

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

1

2

38

Kevin Liu

@kliu128

1 year

it’s a good model sir

OpenAI

@OpenAI

1 year

We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.

6

4

311

OpenAI

@OpenAI

1 year

We’re sharing the GPT-4o System Card, an end-to-end safety assessment that outlines what we’ve done to track and address safety challenges, including frontier model risks in accordance with our Preparedness Framework.

openai.com

This report outlines the safety work carried out prior to releasing GPT-4o including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the...

215

333

2K

Kevin Liu

@kliu128

1 year

Some folks from OpenAI’s Preparedness and agent safety efforts will be at the AI Security Forum and Defcon in Las Vegas next week. If you’d like to chat with us there about AI + cybersecurity, fill out this form (by Aug 5)! https://t.co/aab5F24t8g

docs.google.com

Some members of OpenAI's Preparedness and Agent Safety teams will be at the AI Security Forum and Defcon this year. Fill out this form if you'd like to chat with us! It is best to fill out this form...

1

8

54

Maksym Andriushchenko @ NeurIPS

@maksym_andr

1 year

🚨 We are very excited to release JailbreakBench v1.0! 📄 We have substantially extended the version 0.1 that was on arXiv since March: - More attack artifacts (Prompt template with random search in addition to GCG, PAIR, and JailbreakChat): https://t.co/WCLixoM3fO. - More

3

26

115

Maksym Andriushchenko @ NeurIPS

@maksym_andr

2 years

Great to see that both of our recent papers—JailbreakBench ( https://t.co/WgzTj56PKB) and our adaptive attack paper ( https://t.co/nlebNR6GB5)—have been used by Google to evaluate the robustness of Gemini 1.5 Flash/Pro against jailbreaking attacks! An interesting comment from

2

15

115

Daniel Geng

@dangengdg

2 years

What do you see in these images? These are called hybrid images, originally proposed by Aude Oliva et al. They change appearance depending on size or viewing distance, and are just one kind of perceptual illusion that our method, Factorized Diffusion, can make.

10

102

449

Patrick Chao

@patrickrchao

2 years

For more details, check out: Paper: https://t.co/bUNqVRCUo8 Code: https://t.co/o17WWX8c1B Leaderboard: https://t.co/sWgg11C4W7 This could not be possible without our great collaborators: @edoardo_debe, @maksym_andr, @AlexRobey23, @fra__31, @VSehwag_, @EdgarDobriban, @tml_lab,

github.com

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track] - JailbreakBench/jailbreakbench

0

3

12

Patrick Chao

@patrickrchao

2 years

At the moment, we have added the following attacks and defenses (more are coming soon). Even on Vicuna-13B, the attack success rate is far from 100% - there is a lot of room for improvement! Consider submitting your own attacks/defenses. 🧵7/n