Ori Press @ori_press X Profile

Ori Press

@ori_press

Followers

449

Following

109K

Media

23

Statuses

133

PhD from @uni_tue I'm on the industry job market, feel free to reach out! I yearn to deep learn

https://t.co/QkO9LgGZAX

Joined December 2018

Don't wanna be here? Send us removal request.

Ori Press

@ori_press

4 months

Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️

6

64

160

John Yang

@jyangballin

6 days

New eval! Code duels for LMs ⚔️ Current evals test LMs on *tasks*: "fix this bug," "write a test" But we code to achieve *goals*: maximize revenue, cut costs, win users Meet CodeClash: LMs compete via their codebases across multi-round tournaments to achieve high-level goals

26

90

365

Ori Press

@ori_press

1 month

Check out the full trajectories here:

algotune.io

Can Language Models Speed Up General-Purpose Numerical Programs?

0

2

Ori Press

@ori_press

1 month

Claude Sonnet 4.5 manages to score 1.52x on AlgoTune, coming in just in front of GLM 4.5. 3+ months in, newer models don't seem to be able to make meaningful gains on AlgoTune, excited to see how this evolves! 🚀🚀

1

0

11

Ofir Press

@OfirPress

2 months

This has been a long time coming. One avenue for progress is to have LMs learn in virtual gym environments such as in SWE-gym, SWE-smith, or our new AlgoTune environments. These can be generated autonomously or crafted manually. Lots more to do here!

Stephanie Palazzolo

@steph_palazzolo

2 months

OpenAI's models are getting too smart for human contractors to teach them new things in certain domains like linguistics. One contractor I spoke with said they're struggling to come up with new tasks GPT-5 can't do. https://t.co/BWBrigqm1V

1

15

Ori Press

@ori_press

2 months

Amazing work by @AndyLin2001! Check it out:

Haowei Lin

@AndyLin2001

2 months

Excited to announce that I've adapted AlgoTune for both OpenHands and Terminal Bench! It's a fast, unbounded benchmark perfect for evaluating AI agents, offering a great alternative to slower suites like SWE/Kaggle tasks. Check it out: https://t.co/bmwOQUoTxa #Agent #Benchmark

0

2

5

Kilian Lieret

@KLieret

3 months

What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately. Read more in the SWE-bench blog 🧵

18

21

272

Richard C. Suwandi

@richardcsuwandi

3 months

Introducing OpenEvolve x AlgoTune! Now you can run and benchmark evolutionary coding agents on 100+ algorithm optimization tasks from https://t.co/sPqLhaZyGj

2

20

186

Ori Press

@ori_press

3 months

The complete logs for every model are viewable here:

algotune.io

Can Language Models Speed Up General-Purpose Numerical Programs?

0

1

Ori Press

@ori_press

3 months

GPT-5 and GPT-5 mini results are now live on AlgoTune!

2

11

Ori Press

@ori_press

3 months

Just added Claude Opus 4.1 and gpt-oss-120b to the AlgoTune leaderboard. Excited to see if GPT-5 can break the 2 barrier!

0

2

17

Ofir Press

@OfirPress

3 months

We know that a bunch of teams are working on applying AlphaEvolve to AlgoTune, super excited to see some initial results! This is going to get super interesting.

1

4

23

Ori Press

@ori_press

3 months

We just benchmarked Qwen 3 Coder and GLM 4.5 on AlgoTune, and they manage to beat Claude Opus 4! We're excited to see if the models that will be released this week manage to make progress. Also: I just defended my PhD and I'm on the industry job market, my DMs are open :)

0

3

31

Ofir Press

@OfirPress

4 months

Congrats to my brother Dr. Ori Press on passing his PhD defense! @ori_press

11

3

100

Brandon Amos

@brandondamos

4 months

Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code 🚀 https://t.co/bdR630y0dL 📚 https://t.co/vSnV3eUgVs 🤖 https://t.co/krJ7XDrJFA with @OfirPress @ori_press @PatrickKidger @b_stellato @ArmanZharmagam1 & many others 🧵

3

41

196

Ofir Press

@OfirPress

4 months

AlgoBench is extremely tough, with agents not finding substantial speedups on most tasks. But sometimes these agents do really cool things: here, the agent realized that it could solve this convex optimization problem with a scipy function, leading to an 81x speedup.

10

27

159

Ori Press

@ori_press

4 months

Thanks to all contributors that submitted tasks, as well as @OfirPress for advising! Read the paper: https://t.co/utmxpu2t2J Check out the code: https://t.co/s1ttWnfo2m (6/6)

github.com

AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each problem, and is faster than existing implementations. - ori...

0

8

Ori Press

@ori_press

4 months

Check out our website, https://t.co/JqD76Du6lp, for agent traces, and the code they ended up with for each algo. Our framework allows for anyone to easily submit tasks they think would be interesting to optimize. (5/6)

1

0

8

Ori Press

@ori_press

4 months

The current best overall AlgoTune score is 1.76x, achieved by o4-mini. We think that a score of 100x is possible, as progress should be possible from many angles: rewriting existing Python code in Numba or Cython, implementing existing faster algos, or discovering new ones. (4/6)

1

9

Ori Press

@ori_press

4 months

We release an agent, AlgoTuner, that enables LMs to optimize code. Using our system, LMs can get feedback on how fast their code is, profile its runtime, and compare their code to the reference implementation. (3/6)

1

0

10

Ori Press

@ori_press

4 months

For each algo, we give Gemini, Claude, o4-mini, and R1 a budget of 1 dollar, and have them iteratively develop code. Results are at: https://t.co/aRrbkD3FHi Models sometimes successfully optimize code, but are not currently able to come up with novel algos (2/6)

algotune.io

Can Language Models Speed Up General-Purpose Numerical Programs?

1

0

10