Ori Press Profile
Ori Press

@ori_press

Followers
449
Following
109K
Media
23
Statuses
133

PhD from @uni_tue I'm on the industry job market, feel free to reach out! I yearn to deep learn

Joined December 2018
Don't wanna be here? Send us removal request.
@ori_press
Ori Press
4 months
Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!๐Ÿงตโฌ‡๏ธ
6
64
160
@jyangballin
John Yang
6 days
New eval! Code duels for LMs โš”๏ธ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals
26
90
365
@ori_press
Ori Press
1 month
Check out the full trajectories here:
Tweet card summary image
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
0
0
2
@ori_press
Ori Press
1 month
Claude Sonnet 4.5 manages to score 1.52x on AlgoTune, coming in just in front of GLM 4.5. 3+ months in, newer models don't seem to be able to make meaningful gains on AlgoTune, excited to see how this evolves! ๐Ÿš€๐Ÿš€
1
0
11
@OfirPress
Ofir Press
2 months
This has been a long time coming. One avenue for progress is to have LMs learn in virtual gym environments such as in SWE-gym, SWE-smith, or our new AlgoTune environments. These can be generated autonomously or crafted manually. Lots more to do here!
@steph_palazzolo
Stephanie Palazzolo
2 months
OpenAI's models are getting too smart for human contractors to teach them new things in certain domains like linguistics. One contractor I spoke with said they're struggling to come up with new tasks GPT-5 can't do. https://t.co/BWBrigqm1V
1
1
15
@ori_press
Ori Press
2 months
Amazing work by @AndyLin2001! Check it out:
@AndyLin2001
Haowei Lin
2 months
Excited to announce that I've adapted AlgoTune for both OpenHands and Terminal Bench! It's a fast, unbounded benchmark perfect for evaluating AI agents, offering a great alternative to slower suites like SWE/Kaggle tasks. Check it out: https://t.co/bmwOQUoTxa #Agent #Benchmark
0
2
5
@KLieret
Kilian Lieret
3 months
What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately. Read more in the SWE-bench blog ๐Ÿงต
18
21
272
@richardcsuwandi
Richard C. Suwandi
3 months
Introducing OpenEvolve x AlgoTune! Now you can run and benchmark evolutionary coding agents on 100+ algorithm optimization tasks from https://t.co/sPqLhaZyGj
2
20
186
@ori_press
Ori Press
3 months
The complete logs for every model are viewable here:
Tweet card summary image
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
0
0
1
@ori_press
Ori Press
3 months
GPT-5 and GPT-5 mini results are now live on AlgoTune!
2
2
11
@ori_press
Ori Press
3 months
Just added Claude Opus 4.1 and gpt-oss-120b to the AlgoTune leaderboard. Excited to see if GPT-5 can break the 2 barrier!
0
2
17
@OfirPress
Ofir Press
3 months
We know that a bunch of teams are working on applying AlphaEvolve to AlgoTune, super excited to see some initial results! This is going to get super interesting.
1
4
23
@ori_press
Ori Press
3 months
We just benchmarked Qwen 3 Coder and GLM 4.5 on AlgoTune, and they manage to beat Claude Opus 4! We're excited to see if the models that will be released this week manage to make progress. Also: I just defended my PhD and I'm on the industry job market, my DMs are open :)
0
3
31
@OfirPress
Ofir Press
4 months
Congrats to my brother Dr. Ori Press on passing his PhD defense! @ori_press
11
3
100
@brandondamos
Brandon Amos
4 months
Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code ๐Ÿš€ https://t.co/bdR630y0dL ๐Ÿ“š https://t.co/vSnV3eUgVs ๐Ÿค– https://t.co/krJ7XDrJFA with @OfirPress @ori_press @PatrickKidger @b_stellato @ArmanZharmagam1 & many others ๐Ÿงต
3
41
196
@OfirPress
Ofir Press
4 months
AlgoBench is extremely tough, with agents not finding substantial speedups on most tasks. But sometimes these agents do really cool things: here, the agent realized that it could solve this convex optimization problem with a scipy function, leading to an 81x speedup.
10
27
159
@ori_press
Ori Press
4 months
Check out our website, https://t.co/JqD76Du6lp, for agent traces, and the code they ended up with for each algo. Our framework allows for anyone to easily submit tasks they think would be interesting to optimize. (5/6)
1
0
8
@ori_press
Ori Press
4 months
The current best overall AlgoTune score is 1.76x, achieved by o4-mini. We think that a score of 100x is possible, as progress should be possible from many angles: rewriting existing Python code in Numba or Cython, implementing existing faster algos, or discovering new ones. (4/6)
1
1
9
@ori_press
Ori Press
4 months
We release an agent, AlgoTuner, that enables LMs to optimize code. Using our system, LMs can get feedback on how fast their code is, profile its runtime, and compare their code to the reference implementation. (3/6)
1
0
10
@ori_press
Ori Press
4 months
For each algo, we give Gemini, Claude, o4-mini, and R1 a budget of 1 dollar, and have them iteratively develop code. Results are at: https://t.co/aRrbkD3FHi Models sometimes successfully optimize code, but are not currently able to come up with novel algos (2/6)
Tweet card summary image
algotune.io
Can Language Models Speed Up General-Purpose Numerical Programs?
1
0
10