
Lorenz Kuhn
@_lorenzkuhn
Followers
1K
Following
961
Media
44
Statuses
247
Reasoning Research @OpenAI | o1-preview through o3
Joined January 2014
How can we measure how uncertain LLMs are about their generations?.In our spotlight at #iclr2023 with @yaringal & @sebfar, we introduce "semantic entropy", the entropy over meanings rather than sequences, and show that it reliably measures LM uncertaintyš§µ
arxiv.org
We introduce a method to measure uncertainty in large language models. For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation...
5
29
184
It's been a pleasure working on this with @ahelkky @andresnds @clavera_i @MostafaRohani and many others!.
0
0
17
Just two years ago, our smartest models could barely solve the easiest competitive programming problems. Last week, our latest reasoning models achieved a gold medal score at the International Olympiads of Informatics. Competitive programming is one of the cleanest examples of.
1/n Iām thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold š„š„ in one of the worldās top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! šØāš»šØāš»
12
8
149
RT @MilesKWang: IMO gold is a win for scaling ~nearly~ superhuman oversight on a fuzzy, hard-to-verify RL domain.
0
3
0
It was thrilling to watch AI compete against some of the best human competitive programmers at AtCoder World Finals Heuristics yesterday. Check out @andresnds ās thread on how the AI solutions improved throughout the 10h contest. Congrats to @FakePsyho on 1st place!.
1/N Yesterday in Tokyo we @OpenAI ran a 10āhour live Humans vs AI exhibition at the AtCoder World Tour Finals Heuristic. We pointed an OpenAI reasoning model at the same brutal problem the finalists tackledāno human help, same rules, same clock. Buckle up. š
1
3
48
RT @ahelkky: Congratulations @FakePsyho on a nail-biting performance! Great showings as well from @bminaiev, @andresnds, and @_lorenzkuhn rā¦.
0
4
0
Two important points from our new technical report:.1. Scaling continues to work and the bitter lesson still holds.2. Recent AI models are strong at reasoning tasks and are rapidly becoming stronger āĀ 4o was released less than a year ago, o1 less than six months ago.
11/ Since competitive programming is just one facet of coding, o3 contributors also evaluated models on software engineering tasks. While thereās still a long way to go, itās clear that learning to reason through RL improves SWE capabilities.
0
0
7
i generally feel super grateful that i get to work with such exceptionally skilled and kind people on reasoning research. the sprint for IOI in particular was special though. IOI 2024 gold @ 10k submissions; 49th percentile of competitors under real contest conditions.
As a coach for the US IOI team, Iāve been motivated for a long time to create models which can perform at the level of the most elite competitors in the world. Check out our research blog post - with enough samples, we achieve gold medal performance on this yearās IOI and ~14/15.
0
0
8
RT @polynoamial: Today, Iām excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general rā¦.
0
2K
0
very excited about these models helping people solve hard problems and proud of the work we did. give the new models a try!.
We're releasing a preview of OpenAI o1āa new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
1
0
16
RT @LiamFedus: But the ELO can ultimately become bounded by the difficulty of the prompts (i.e. canāt achieve arbitrarily high win rates onā¦.
0
92
0
RT @ajeya_cotra: Excellent post by @JacobSteinhardt trying to forecast the abilities of models that could be trained in 2030: https://t.co/ā¦.
bounded-regret.ghost.io
GPT-4 surprised many people with its abilities at coding, creative brainstorming, letter-writing, and other skills. How can we be less surprised by developments in machine learning? In this post,...
0
8
0
RT @anndvision: new preprint. "ReLU to the Rescue: Improve your On-policy Actor-Critic with Positive Advantages". shockingly simple changesā¦.
0
17
0
RT @seb_far: The Google DeepMind alignment team is looking for research scientists and research engineers to help us work towards safe AGI.ā¦.
0
3
0
RT @DeepMind: With more powerful AI systems comes more responsibility to identify novel capabilities in models. š. Our new research looks aā¦.
0
169
0