hardmaru Profile Banner
hardmaru Profile
hardmaru

@hardmaru

Followers
347K
Following
144K
Media
4K
Statuses
26K

Building Collective Intelligence @SakanaAILabs 🧠

Minato-ku, Tokyo
Joined November 2014
Don't wanna be here? Send us removal request.
@hardmaru
hardmaru
11 months
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 👩‍🔬. It’s common for AI researchers to joke amongst themselves that “now all we need to do is figure out how to make AI write the papers for us!” but I think we’re now getting there!.
@SakanaAILabs
Sakana AI
11 months
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery!. From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
Tweet media one
Tweet media two
Tweet media three
Tweet media four
33
145
760
@hardmaru
hardmaru
19 hours
RT @Techmeme: Tokyo-based Sakana AI details a new Monte Carlo tree search-based technique that lets multiple LLMs cooperate on a single tas….
0
11
0
@hardmaru
hardmaru
19 hours
RT @adcock_brett: Sakana AI dropped AB-MCTS, an algo that lets competing AI models work together, building on their strengths and errors, t….
0
17
0
@hardmaru
hardmaru
2 days
“AI doesn’t work the way you think it does”.
0
1
17
@hardmaru
hardmaru
3 days
Nice video discussing the recent thought-provoking paper: “Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis” By @akarshkumar0101 @jeffclune @joelbot3000 @kenneth0stanley. 15min @MLStreetTalk video↓.
@MLStreetTalk
Machine Learning Street Talk
3 days
AI is so smart, why are its internals 'spaghetti'? We spoke with @kenneth0stanley and @akarshkumar0101 (MIT) about their new paper: Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis. Co-authors: @jeffclune @joelbot3000
10
19
171
@hardmaru
hardmaru
3 days
Shameless plug:.
0
0
15
@hardmaru
hardmaru
3 days
Someone like Soham Parekh would never be able to pass Sakana AI’s hardcore job application process. 😎🔥.
2
0
23
@hardmaru
hardmaru
3 days
Cluely should hire Soham Parekh. They deserve each other. 🙃.
5
2
128
@hardmaru
hardmaru
5 days
The top open SWE models are all Qwen3, Qwen2, QwQ, Deepseek-V3 or R1-based.
1
0
12
@hardmaru
hardmaru
5 days
DeepSWE is a new state-of-the-art open-source software engineering model trained entirely using reinforcement learning, based on Qwen3-32B. Fantastic work from @togethercompute @Agentica_
Tweet media one
@togethercompute
Together AI
5 days
Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in
Tweet media one
11
42
256
@hardmaru
hardmaru
5 days
RT @SakanaAILabs: Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search. https://….
0
11
0
@hardmaru
hardmaru
6 days
Our paper: “Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search”.
2
3
24
@hardmaru
hardmaru
6 days
RT @MLStreetTalk: Very interesting work by @SakanaAILabs - they have designed a MoE / novel test time inference framework inspired by MCTS….
0
26
0
@hardmaru
hardmaru
6 days
RT @fchollet: Impressive results from Sakana AI on ARC-AGI-2 with a new method for test-time-search and ensembling!. Please be mindful when….
0
140
0
@hardmaru
hardmaru
6 days
The Multi-LLM AB-MCTS combination of o4-mini + Gemini-2.5-Pro + DeepSeek-R1-0528, current frontier AI models, achieves strong performance on the ARC-AGI-2 benchmark, outperforming individual models by a large margin. Implementation of AB-MCTS on GitHub:.
Tweet media one
3
7
54
@hardmaru
hardmaru
7 days
Many ARC-AGI-2 examples that were unsolvable by any single LLM were solved by combining multiple LLMs. In some cases, an initially incorrect attempt by o4-mini is used by R1-0528 and Gemini-2.5-Pro as a hint to get to the correct solution. ARC-AGI-2 code:.
Tweet media one
2
7
66
@hardmaru
hardmaru
7 days
RT @iwiwi: Go wider, deeper—or together? 🤔 Introducing AB-MCTS (Adaptive Branching Monte Carlo Tree Search), a new inference-time framework….
0
63
0
@hardmaru
hardmaru
7 days
RT @SakanaAILabs: We’re excited to introduce AB-MCTS!. Our new inference-time scaling algorithm enables collective intelligence for AI by a….
0
217
0
@hardmaru
hardmaru
7 days
Inference-Time Scaling and Collective Intelligence for Frontier AI. We developed AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.
17
94
537