_lorenzkuhn Profile Banner
Lorenz Kuhn Profile
Lorenz Kuhn

@_lorenzkuhn

Followers
1K
Following
961
Media
44
Statuses
247

Reasoning Research @OpenAI | o1-preview through o3

Joined January 2014
Don't wanna be here? Send us removal request.
@_lorenzkuhn
Lorenz Kuhn
3 years
How can we measure how uncertain LLMs are about their generations?.In our spotlight at #iclr2023 with @yaringal & @sebfar, we introduce "semantic entropy", the entropy over meanings rather than sequences, and show that it reliably measures LM uncertainty🧵
Tweet card summary image
arxiv.org
We introduce a method to measure uncertainty in large language models. For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation...
5
29
184
@_lorenzkuhn
Lorenz Kuhn
18 days
It's been a pleasure working on this with @ahelkky @andresnds @clavera_i @MostafaRohani and many others!.
0
0
17
@grok
Grok
3 days
Join millions who have switched to Grok.
150
284
2K
@_lorenzkuhn
Lorenz Kuhn
18 days
Just two years ago, our smartest models could barely solve the easiest competitive programming problems. Last week, our latest reasoning models achieved a gold medal score at the International Olympiads of Informatics. Competitive programming is one of the cleanest examples of.
@SherylHsu02
Sheryl Hsu
19 days
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold šŸ„‡šŸ„‡ in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! šŸ‘Øā€šŸ’»šŸ‘Øā€šŸ’»
Tweet media one
12
8
149
@_lorenzkuhn
Lorenz Kuhn
1 month
RT @MilesKWang: IMO gold is a win for scaling ~nearly~ superhuman oversight on a fuzzy, hard-to-verify RL domain.
0
3
0
@_lorenzkuhn
Lorenz Kuhn
1 month
It was thrilling to watch AI compete against some of the best human competitive programmers at AtCoder World Finals Heuristics yesterday. Check out @andresnds ā€˜s thread on how the AI solutions improved throughout the 10h contest. Congrats to @FakePsyho on 1st place!.
@andresnds
Andre Saraiva
1 month
1/N Yesterday in Tokyo we @OpenAI ran a 10‑hour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackled—no human help, same rules, same clock. Buckle up. šŸ‘‡
Tweet media one
Tweet media two
1
3
48
@_lorenzkuhn
Lorenz Kuhn
1 month
RT @ahelkky: Congratulations @FakePsyho on a nail-biting performance! Great showings as well from @bminaiev, @andresnds, and @_lorenzkuhn r….
0
4
0
@_lorenzkuhn
Lorenz Kuhn
7 months
Two important points from our new technical report:.1. Scaling continues to work and the bitter lesson still holds.2. Recent AI models are strong at reasoning tasks and are rapidly becoming stronger — 4o was released less than a year ago, o1 less than six months ago.
@ahelkky
Ahmed El-Kishky
7 months
11/ Since competitive programming is just one facet of coding, o3 contributors also evaluated models on software engineering tasks. While there’s still a long way to go, it’s clear that learning to reason through RL improves SWE capabilities.
Tweet media one
Tweet media two
0
0
7
@_lorenzkuhn
Lorenz Kuhn
1 year
i generally feel super grateful that i get to work with such exceptionally skilled and kind people on reasoning research. the sprint for IOI in particular was special though. IOI 2024 gold @ 10k submissions; 49th percentile of competitors under real contest conditions.
@markchen90
Mark Chen
1 year
As a coach for the US IOI team, I’ve been motivated for a long time to create models which can perform at the level of the most elite competitors in the world. Check out our research blog post - with enough samples, we achieve gold medal performance on this year’s IOI and ~14/15.
0
0
8
@_lorenzkuhn
Lorenz Kuhn
1 year
RT @polynoamial: Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general r….
0
2K
0
@_lorenzkuhn
Lorenz Kuhn
1 year
very excited about these models helping people solve hard problems and proud of the work we did. give the new models a try!.
@OpenAI
OpenAI
1 year
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
1
0
16
@_lorenzkuhn
Lorenz Kuhn
1 year
RT @MillionInt: We trained a model and it is good in some things.
0
46
0
@_lorenzkuhn
Lorenz Kuhn
1 year
RT @LiamFedus: But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. can’t achieve arbitrarily high win rates on….
0
92
0
@_lorenzkuhn
Lorenz Kuhn
2 years
rainy day in sf.
2
0
6
@_lorenzkuhn
Lorenz Kuhn
2 years
RT @anndvision: new preprint. "ReLU to the Rescue: Improve your On-policy Actor-Critic with Positive Advantages". shockingly simple changes….
0
17
0
@_lorenzkuhn
Lorenz Kuhn
2 years
RT @seb_far: The Google DeepMind alignment team is looking for research scientists and research engineers to help us work towards safe AGI.….
0
3
0
@_lorenzkuhn
Lorenz Kuhn
2 years
@markchen90
Mark Chen
2 years
less is generally more for alignment, but not for capabilities.
0
0
0
@_lorenzkuhn
Lorenz Kuhn
2 years
RT @DeepMind: With more powerful AI systems comes more responsibility to identify novel capabilities in models. šŸ”. Our new research looks a….
0
169
0
@_lorenzkuhn
Lorenz Kuhn
2 years
Also, finetuning on this scale barely affects the model performance on these benchmarks, see e.g. Llama 7B vs Alpaca 7B.
1
0
1
@_lorenzkuhn
Lorenz Kuhn
2 years
Eval results:
1
0
1
@_lorenzkuhn
Lorenz Kuhn
2 years
The OS/small model + finetuning approach might be good enough for many applications? How well does academic benchmark perf correlate with human preferences over generations in different settings? The self-instruct human eval might not be sensitive enough to what we care about?.
1
0
2