
Siméon
@Simeon_Cps
Followers
9K
Following
25K
Media
501
Statuses
6K
Creating more common knowledge on AI risks, one tweet at a time. Founder in Paris. AI auditing, standardization & governance.
Joined May 2020
The wave that first hit protein folding scientists in 2019 is now coming for mathematicians.
the openai IMO news hit me pretty heavy this weekend. i'm still in the acute phase of the impact, i think. i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don't think i.
1
0
5
Google is crushing it. They got their gold in natural language with Gemini. It seems like they mostly caught up to OpenAI, in less than a year.
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team!
2
0
18
i'd like to read some commentary by people with relevant domain knowledge about these proofs. Maybe @davidad, @an_interstice or @FabienDRoger? . Do they feel like the kind of proofs that are relying on a ton of knowledge and leveraging how knowledgeable these models are or do.
10/N If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model 😅).
1
0
7
Here we are, crushing benchmarks that characterize the top of human fluid intelligence.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
2
1
17
Meta’s risk management framework better than Google DeepMind’s 👀. That’s what we found in our updated ratings focused on risk management frameworks of companies that signed the Seoul Frontier Safety Commitments. Findings:. 1. Anthropic is still in the lead but Anthropic’s RSP
5
8
75
pure alpha just dropped.
So excited to share that today @CSETGeorgetown and @emergingtechobs are launching an ✨updated✨ version of our chip supply chain explorer! We've got:.👉 New data.👉 New features.👉 New analysis. Links in thread
0
0
5
Moonshot, the Margin Slayers.
For a wide range of tasks, K2 is probably the cheapest model by far right now, in terms of actual costs per task. It is just cheap, it has no long-CoT, and it does not yap. This is very refreshing. Like the best of Anthropic models, but cheaper and even more to the point.
0
0
2
This could be up to 10x the compute that went into o3 post-training btw.
@teortaxesTex Do you know how much FLOP they spent on this post training?. I've heard that o3 was 1 OOM away from saturating the equivalent of compute OpenAI spends on pre-training. So that would still make it not too far from what DS can reach.
0
0
16
People used to think that AI was a software thing. As “pushing the AI frontier” looks increasingly like that, it’s gonna get clearer why @xAI has a surprisingly high chance of getting ahead: Elon is unparalleled in solving hard logistics problems.
0
0
2
Not saying "I told you so" but. I told you so :)
Insane that Elon Musk has pulled it off again, absolutely crushing the AI wars with Grok 4. Summarizing the core announcements:.— Post-training RL spend == pretraining spend.— $3/M input told, $15/M output toks, 256k context, price 2x beyond 128k.— #1 on Humanity’s Last Exam
3
0
10