Hassan Hayat 🔥 @TheSeaMouse X Profile

Hassan Hayat 🔥

@TheSeaMouse

Followers

5K

Following

158K

Media

2K

Statuses

12K

Aspiring Engineer @ General Cognition https://t.co/D4gDyw97gu

Austin, TX

Joined October 2011

Don't wanna be here? Send us removal request.

Hassan Hayat 🔥

@TheSeaMouse

1 day

Imagine giving up on manufacturing semiconductors just as we are seeing the largest compute and infrastructure scale outs in history.

SemiAnalysis

@SemiAnalysis_

1 day

Intel, the home of Moore's Law, for the first time in history, is evaluating if it will continue at the leading edge. From its 10-Q. "However, if we are unable to secure a significant external customer and meet important customer milestones for Intel 14A, we face the prospect

1

2

Hassan Hayat 🔥

@TheSeaMouse

6 days

A high taste AI verifier (at scale) is the key to AGI.

0

2

Hassan Hayat 🔥

@TheSeaMouse

7 days

Alexander Wei

@alexwei_

7 days

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

0

1

9

Hassan Hayat 🔥

@TheSeaMouse

7 days

This may be the breakthrough of the year. The model simulating the tools internally (no environment) and getting a solid answer at the end after hours of thought. Flies in the face of LeCun's arguments.

Sheryl Hsu

@SherylHsu02

7 days

The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level - trying out different strategies, making observations from examples, and testing hypothesis.

0

2

Hassan Hayat 🔥

@TheSeaMouse

7 days

Superintelligence is within view.

Noam Brown

@polynoamial

7 days

Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵.

0

2

Hassan Hayat 🔥

@TheSeaMouse

7 days

Alexander Wei

@alexwei_

7 days

8/N Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

1

0

6

Hassan Hayat 🔥

@TheSeaMouse

7 days

The agent is cooking

0

1

Hassan Hayat 🔥

@TheSeaMouse

9 days

We need a @FabrizioRomano of AI to keep track of all these transfers.

natasha mascarenhas

@nmasc_

9 days

Scoop: Boris Cherny and Cat Wu are back at Anthropic, two weeks after joining Cursor. 🤯🤯🤯 .

0

2

Hassan Hayat 🔥

@TheSeaMouse

10 days

RT @_jasonwei: New blog post about asymmetry of verification and "verifier's law": Asymmetry of verification–the i….

0

242

0

Hassan Hayat 🔥

@TheSeaMouse

14 days

This but with agents.

Pourcel Julien @ICML

@PourcelJulien

16 days

Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!). It brings LLMs from just a few percent on ARC-AGI-1 up to 52%. We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code. 🧵

1

2

Hassan Hayat 🔥

@TheSeaMouse

14 days

First kimi test failed. Will be testing more this weekend

1

0

1

Hassan Hayat 🔥

@TheSeaMouse

16 days

A new pandora's box has been opened where exponentially more synthetic data will be generated to sustain this level of improvement. Many Quadrillion tokens worth of agentic code will be churned by inferencing chips.

wh

@nrehiew_

16 days

xAI spent the same amount of compute on RL as Pretraining? That is insane

1

0

4

Hassan Hayat 🔥

@TheSeaMouse

22 days

Your language model deserves better than just {0,1} verifiers. You have a language model at your disposal. Why did it get the answer wrong? Along what dimensions? What did it get right?.

1

0

1

Hassan Hayat 🔥

@TheSeaMouse

23 days

New superintelligence benchmark just dropped.

Suhail

@Suhail

24 days

PSA: there’s a guy named Soham Parekh (in India) who works at 3-4 startups at the same time. He’s been preying on YC companies and more. Beware. I fired this guy in his first week and told him to stop lying / scamming people. He hasn’t stopped a year later. No more excuses.

1

0

4

Hassan Hayat 🔥

@TheSeaMouse

29 days

New record: 23 minute query

1

0

7

Hassan Hayat 🔥

@TheSeaMouse

1 month

Such thinking. Much wow

1

0

3

Hassan Hayat 🔥

@TheSeaMouse

1 month

One thing about o3-pro is it is significantly more willing to return long outputs than regular o3 if you ask it. Vanilla o3 seems allergic to long outputs.

0

4

Hassan Hayat 🔥

@TheSeaMouse

1 month

After 10 minutes you never know if it's really processing the query but just forgot to update the summary or if the loading bar is stuck and nothing is processed

0

Hassan Hayat 🔥

@TheSeaMouse

1 month

Such a great model. Worth the wait

1

0

Hassan Hayat 🔥

@TheSeaMouse

2 months

Playing with o3-pro is an exercise in patience

2

0

7