Ori Press Profile
Ori Press

@ori_press

Followers
430
Following
106K
Media
22
Statuses
128

I'm on the industry job market, feel free to reach out! I yearn to deep learn

Joined December 2018
Don't wanna be here? Send us removal request.
@ori_press
Ori Press
2 months
Do language models have algorithmic creativity?. To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️
Tweet media one
6
60
159
@ori_press
Ori Press
7 days
RT @KLieret: What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it….
0
20
0
@grok
Grok
8 days
Join millions who have switched to Grok.
221
456
3K
@ori_press
Ori Press
15 days
RT @richardcsuwandi: Introducing OpenEvolve x AlgoTune! . Now you can run and benchmark evolutionary coding agents on 100+ algorithm optim….
0
20
0
@ori_press
Ori Press
16 days
The complete logs for every model are viewable here:
Tweet card summary image
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
0
0
1
@ori_press
Ori Press
16 days
GPT-5 and GPT-5 mini results are now live on AlgoTune!
Tweet media one
2
2
11
@ori_press
Ori Press
20 days
Just added Claude Opus 4.1 and gpt-oss-120b to the AlgoTune leaderboard. Excited to see if GPT-5 can break the 2 barrier!
Tweet media one
0
2
17
@ori_press
Ori Press
21 days
RT @OfirPress: We know that a bunch of teams are working on applying AlphaEvolve to AlgoTune, super excited to see some initial results! Th….
0
4
0
@ori_press
Ori Press
23 days
We just benchmarked Qwen 3 Coder and GLM 4.5 on AlgoTune, and they manage to beat Claude Opus 4! We're excited to see if the models that will be released this week manage to make progress. Also: I just defended my PhD and I'm on the industry job market, my DMs are open :)
Tweet media one
0
3
32
@ori_press
Ori Press
1 month
RT @OfirPress: Congrats to my brother Dr. Ori Press on passing his PhD defense! @ori_press
Tweet media one
0
3
0
@ori_press
Ori Press
2 months
RT @brandondamos: Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code. 🚀 https://t….
0
37
0
@ori_press
Ori Press
2 months
RT @OfirPress: AlgoBench is extremely tough, with agents not finding substantial speedups on most tasks. But sometimes these agents do real….
0
27
0
@ori_press
Ori Press
2 months
Check out our website, for agent traces, and the code they ended up with for each algo. Our framework allows for anyone to easily submit tasks they think would be interesting to optimize. (5/6).
1
0
8
@ori_press
Ori Press
2 months
The current best overall AlgoTune score is 1.76x, achieved by o4-mini. We think that a score of 100x is possible, as progress should be possible from many angles: rewriting existing Python code in Numba or Cython, implementing existing faster algos, or discovering new ones. (4/6)
Tweet media one
1
1
9
@ori_press
Ori Press
2 months
We release an agent, AlgoTuner, that enables LMs to optimize code. Using our system, LMs can get feedback on how fast their code is, profile its runtime, and compare their code to the reference implementation. (3/6).
1
0
10
@ori_press
Ori Press
2 months
For each algo, we give Gemini, Claude, o4-mini, and R1 a budget of 1 dollar, and have them iteratively develop code. Results are at: ..Models sometimes successfully optimize code, but are not currently able to come up with novel algos (2/6).
Tweet card summary image
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
1
0
10
@ori_press
Ori Press
3 months
RT @a1zhang: Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II?. 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS….
0
77
0
@ori_press
Ori Press
4 months
RT @OfirPress: Completing games requires long context and complex visual processing- so we put a bunch of 90s games into an emulator and ma….
0
8
0
@ori_press
Ori Press
6 months
0
1
0
@ori_press
Ori Press
7 months
RT @KLieret: SWE-agent 1.0 is the open-source SOTA on SWE-bench Lite! Tons of new features: massively parallel runs; cloud-based deployment….
0
18
0