Alex Dimakis
@AlexGDimakis
Followers
22K
Following
29K
Media
239
Statuses
4K
Professor, UC berkeley | Founder @bespokelabsai |
Berkeley, CA
Joined April 2009
I just donated to OpenReview. And I would encourage all those who care about open science to please put their money where their mouth is, and help.
OpenReview is a lifeline for progress in the AI research community, and it urgently needs our increased support. https://t.co/HJJNRcl9km In 2025 alone, OpenReview supported over 1,300 conferences and workshops, served 3.3 million active monthly users, handled over 278,000 paper
0
1
10
As much as I want to say written exams were invented by the Greeks, According to ChatGPT it goes back to 600 CE China: Imperial Examination System (科举, keju) •Began under the Sui dynasty (6th–7th century) •Fully developed by Tang and Song dynasties •Used paper, brush, and
10
3
97
Critics say Israel is a liability—but facts show it’s America’s strategic shield. These 5 reasons reveal its vital role in defense, intel & stability. Watch the full video to see why Israel matters.
12
24
84
My final exam is today in Berkeley. Pen and paper, in person, all the students try to solve challenging problems. No machines. This ancient method of evaluating students is going to survive in the AI era.
89
167
3K
https://t.co/1mlEpXW6TV Gemini model seems to be a better research paper reviewer than most humans in STOC 2026 experiment, at least as far as correctness is concerned.
research.google
1
5
60
Check out all these great research project releases that were announced on the last night of Neurips 2025 including OpenThoughts-agent.
The final night of Laude Lounge at NeurIPS 2025 focused on stack-level progress in open frontier AI, featuring: Michael Ryan, @DSPyOSS
@etash_guha, @NeginRaoof_ , Ben Feuer, @ryanmart3n - OpenThoughts-Agent @LakshyAAAgrawal, GEPA @alexgshaw, Harbor @tyler_griggs_ , SkyRL
0
1
7
COLM 2026 is just around the corner! Mark your calendars for: đź’ˇAbstract deadline: Thursday, March 26, 2026 đź“„Full paper submission deadline: Tuesday, March 31, 2026 Call for papers in thread (website coming soon).
4
23
175
🧵Tired of scrolling through your horribly long model traces in VSCode to figure out why your model failed? We made StringSight to fix this: an automated pipeline for analyzing your model outputs at scale. ➡️Demo: https://t.co/FJ4GAxPIkx ➡️Blog: https://t.co/3AyXBFBEmV
3
35
84
Congratulations to Adam Klivans and all the co-authors for winning the FOCS 2025 Test of Time Award! Their paper was a learning theory breakthrough: It provided the first efficient algorithm for learning halfspaces when there is adversarial label noise, under distributional
Adam Klivans Wins Test of Time Award at FOCS 2025: https://t.co/Tj5WEy9SNn
0
1
59
Just finished evaluating GPT-5.2 (reasoning high) on Terminal-Bench 2.0. ~on par with Gemini 3.0 Pro and a few points behind Opus 4.5 I've been loving the Terminus-2-only leaderboard filter, đź”— below!
2
2
18
Remember that result that RL improves math performance even with random rewards? Gladly Olmo 3 showed this was due to data contamination. Shows again, as Cameron says, the value of open data for scientific progress in AI.
Easy to miss because it's on the last page of the paper, but Olmo 3 RL-Zero has a really nice sub-section on RL with random rewards! Prior papers (Shao et al - "Spurious Rewards: Rethinking Training Signals in RLVR") show RLVR still improves performance on math problems even
8
19
215
The multiple answers mystery is the most surprising thing we stumbled on from OpenThoughts: Sampling multiple answers for the same question is better than having more questions, each answered once. To explain: Say you are creating a dataset of questions and answers to SFT a
13
26
214
I will be presenting GEPA at the FoRLM workshop @ NeurIPS (Foundations of Reasoning in Language Models)! Please drop by Upper Level Room 33ABC (San Diego) between 10-10:15 AM to hear about how prompt optimization can outperform reinforcement learning! https://t.co/aeFnyHO1VX
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
3
19
97
Important insights from Junyang Lin Tech lead from Qwen team: “For the next generation model we are probably using this architecture” Also “imagine the agent running for 1-2 days and then it’s done and has built your app, memory and long context will be very important”.
10
49
493
Announcing our new project on how to train agents for TerminalBench: OpenThoughts-Agent. We curate SFT data and RL Environments and open the full stack, for the best model of its size.
How can we make a better TerminalBench agent? Today, we are announcing the OpenThoughts-Agent project. OpenThoughts-Agent v1 is the first TerminalBench agent trained on fully open curated SFT and RL environments. OpenThinker-Agent-v1 is the strongest model of its size on
5
13
112
And this is how you present a poster. Masterful.
A better view and quality: https://t.co/czBeIJEqc5
0
3
19
Taking a step towards building a modular RL framework with our SkyRL project.
✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: https://t.co/jDvM95F0Bq Code: https://t.co/CWlKue79JH
5
18
74
Measuring agents in production: valuable information on agents from the trenches
Thrilled to release our new paper MAP: Measuring Agents in Production ⚙️🚀 2025 is the year of agents… but do they actually work in the real world? Is it just hype? A group of 25 researchers from Berkeley, Stanford, UIUC, IBM, and Intesa Sanpaolo investigated what makes agents
3
7
31