
Ori Press
@ori_press
Followers
430
Following
106K
Media
22
Statuses
128
I'm on the industry job market, feel free to reach out! I yearn to deep learn
Joined December 2018
Do language models have algorithmic creativity?. To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️
6
60
159
RT @KLieret: What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it….
0
20
0
RT @richardcsuwandi: Introducing OpenEvolve x AlgoTune! . Now you can run and benchmark evolutionary coding agents on 100+ algorithm optim….
0
20
0
The complete logs for every model are viewable here:
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
0
0
1
RT @OfirPress: We know that a bunch of teams are working on applying AlphaEvolve to AlgoTune, super excited to see some initial results! Th….
0
4
0
RT @brandondamos: Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code. 🚀 https://t….
0
37
0
RT @OfirPress: AlgoBench is extremely tough, with agents not finding substantial speedups on most tasks. But sometimes these agents do real….
0
27
0
Thanks to all contributors that submitted tasks, as well as @OfirPress for advising!. Read the paper: Check out the code: .(6/6).
github.com
AlgoTune is a benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each problem, and is faster than existing implementations. - oripress/AlgoTune
0
0
8
For each algo, we give Gemini, Claude, o4-mini, and R1 a budget of 1 dollar, and have them iteratively develop code. Results are at: ..Models sometimes successfully optimize code, but are not currently able to come up with novel algos (2/6).
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
1
0
10
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….
0
77
0
RT @OfirPress: Completing games requires long context and complex visual processing- so we put a bunch of 90s games into an emulator and ma….
0
8
0
RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….
0
18
0