
Ryan Chi
@ryanandrewchi
Followers
377
Following
32
Media
1
Statuses
27
@openai research | prev @stanfordnlp
SF
Joined November 2020
Amazing job team!.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
0
0
0
RT @markchen90: Really excited to start the Strategic Deployment team with @aleks_madry!. If you want to work on pushing the frontiers in o….
0
10
0
RT @aleks_madry: If AGI is about AI transforming our economy—how close are we, really? What's still missing, and how do we get there?. Open….
0
93
0
I'm back! Thanks @BorisMPower for your help.
Please report @ryan_andrewchi , who’s impersonating @ryanandrewchi , who is on our team.
1
0
14
RT @jameschua_sg: OpenAI found that misaligned models can develop a bad boy persona, allowing for detection. But what if models are conditi….
0
10
0
RT @MilesKWang: We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We f….
0
451
0
Really enjoyed contributing to this project! Take a look at our blog post & paper:
Understanding and preventing misalignment generalization. Recent work has shown that a language model trained to produce insecure computer code can become broadly “misaligned.” This surprising effect is called “emergent misalignment.” We studied why this happens. Through this.
2
0
10
Thank you @chrmanning for everything you've given me and Stanford NLP! It's been the opportunity of a lifetime to be in your lab.
2
0
40
RT @john__allard: Super excited to ship Reinforcement Fine‑Tuning (RFT) on o4‑mini today 🎉 Our aim is to make RL as flexible & accessible a….
0
6
0
Really excited to share with the world what I've been working on since joining OpenAI! Give it a try!
Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take
2
1
11
RT @HarryMayne5: 🚨🌍Introducing our new reasoning benchmark, LINGOLY (which the current top models only score ~35% on!😳). LINGOLY uses UK Li….
0
1
0
RT @hannahrosekirk: 🌎Introducing LINGOLY, our new reasoning benchmark that stumps even top LLMs (best models only reach ~35% accuracy)🥴.In….
0
35
0
RT @askalphaxiv: Featuring our first paper of the week, "Premise Order Matters in Reasoning With LLMs": Premise re….
0
4
0
RT @_akhaliq: Here is my selection of papers for today (15 Feb) on Hugging Face. Computing Power and the Governanc….
0
5
0
premise order matters📈 in LLM reasoning, exposing frailties far more pronounced than a human's. See our preprint here: w/ @xinyun_chen_ Xuezhi Wang @denny_zhou -- grateful to have worked on this at @GoogleDeepMind.
New preprint🔥: Premise Order Matters in Reasoning with Large Language Models. In typical logical reasoning, premise order doesn't matter. However, for SOTA LLMs, changing the premise order may cause an accuracy drop of >30%!. 🧵.1/8
1
0
6