Piotr Piękos @PiotrPiekosAI X Profile

Piotr Piękos

@PiotrPiekosAI

Followers

235

Following

327

Media

11

Statuses

54

PhD student with @SchmidhuberAI at @KAUST. Interested in systematic generalization and reasoning.

https://t.co/CPtoHGakpa

Joined April 2012

Don't wanna be here? Send us removal request.

Piotr Piękos

@PiotrPiekosAI

13 days

If you’re not happy with the quality of the reviews you got, remember that with current trends the reviews at ICML will be even worse

0

2

Jürgen Schmidhuber

@SchmidhuberAI

1 month

Our Huxley-Gödel Machine learns to rewrite its own code, estimating its own long-term self-improvement potential. It generalizes on new tasks (SWE-Bench Lite), matching the best officially checked human-engineered agents. Arxiv 2510.21614 With @Wenyi_AI_Wang, @PiotrPiekosAI,

56

154

1K

Piotr Piękos

@PiotrPiekosAI

1 month

Authors: Wenyi Wang, @PiotrPiekosAI , @nbl_ai , Firas Laakom, @Beastlyprime , @MatOstasze, @MingchenZhuge , @SchmidhuberAI ArXiv: https://t.co/jpQ1JQjhO4 Github:

github.com

🧬 The Huxley-Gödel Machine. Contribute to metauto-ai/HGM development by creating an account on GitHub.

1

2

14

Piotr Piękos

@PiotrPiekosAI

1 month

✅HGM’s improvements transfer across datasets and LLMs. Optimized on SWE-Bench Verified with GPT-5 mini, evaluated on SWE-Bench Lite with GPT-5, HGM reaches human-level performance 👩‍💻🤖, matching the best officially checked human-engineered coding agent.

1

0

9

Piotr Piękos

@PiotrPiekosAI

1 month

⚡️HGM selects agents for self-improvement based on CMP estimates. On SWE-Bench Verified, HGM outperforms prior self-improving coding-systems methods while using 🕐 less wall-clock time.

1

0

10

Piotr Piękos

@PiotrPiekosAI

1 month

💡Introducing Clade Metaproductivity (CMP): A metric that measures an agent’s true potential for self-improvement via its descendants. We prove: under certain assumptions, access to true CMP lets a coding agent simulate the Gödel Machine - the optimal self-improving machine.🧩

1

0

11

Piotr Piękos

@PiotrPiekosAI

1 month

Most self-improving coding systems⚙️self-modify better performing agents. But❌immediate higher scores ≠ long-term self-improvement. We call this the Metaproductivity–Performance Mismatch 📉

1

0

11

Piotr Piękos

@PiotrPiekosAI

1 month

🚨Time to let agents code themselves! Meet Huxley-Gödel Machine (HGM), a game changer in coding agent development🚨 [🤖vs.🧑‍💻]HGM evolves by self-rewrites to match the best officially checked human-engineered agents on SWE-Bench Lite despite being optimized on a different dataset.

1

11

60

Piotr Piękos

@PiotrPiekosAI

7 months

Next time you complain about the bidding system, be grateful that you at least had a chance to bid.

0

2

fly51fly

@fly51fly

7 months

[LG] Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing P Piękos, R Csordás, J Schmidhuber [KAUST & Stanford University] (2025) https://t.co/kR2NuFNa5P

0

4

20

Josh Cason

@TheGrizztronic

7 months

Intuitive and reduces complexity. Measurable effectiveness. That's what you're going for.

Piotr Piękos

@PiotrPiekosAI

7 months

In MoSA, each attention head selects k tokens to process using Expert-Choice Routing. This creates a learnable, context-based sparse attention with reduced complexity from O(T^2) to O(T+k^2). The saved compute can be used to create a large set of specialized, sparse heads.

0

1

2

Piotr Piękos

@PiotrPiekosAI

7 months

What if instead of a couple of dense attention heads, we use lots of sparse heads, each learning to select its own set of tokens to process? Introducing Mixture of Sparse Attention (MoSA)

1

4

19

Piotr Piękos

@PiotrPiekosAI

7 months

More results can be found in the paper. Huge thanks to my amazing collaborators: @robert_csordas , @SchmidhuberAI Paper: https://t.co/20toWkvjyL Code:

github.com

User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice routing providing a content-based sparse attention mechanism. -...

0

1

Piotr Piękos

@PiotrPiekosAI

7 months

MoSA’s advantages also hold on longer sequences. Our preliminary results on lengths up to 8192 show that MoSA consistently holds the advantage over the compared sparse baselines.

1

0

2

Piotr Piękos

@PiotrPiekosAI

7 months

MoSA improves resource utilization in the practical setting as well. Even with a pure Torch implementation, when perplexity-matched to a dense baseline, it runs in faster wall-clock time, has lower memory usage, and uses a significantly smaller KV cache.

1

0

2

Piotr Piękos

@PiotrPiekosAI

7 months

The sparsity induced by MoSA helps even in standard length sequences, reducing perplexity by up to 27% in the IsoFLOP setting compared to dense attention. Moreover, MoSA is the only sparse attention variant we tested that reduces perplexity in the standard length setting.

1

0

2

Piotr Piękos

@PiotrPiekosAI

7 months

In MoSA, each attention head selects k tokens to process using Expert-Choice Routing. This creates a learnable, context-based sparse attention with reduced complexity from O(T^2) to O(T+k^2). The saved compute can be used to create a large set of specialized, sparse heads.

1

6

Piotr Piękos

@PiotrPiekosAI

7 months

What if instead of a couple of dense attention heads, we use lots of sparse heads, each learning to select its own set of tokens to process? Introducing Mixture of Sparse Attention (MoSA)

1

4

19

Peter Richtarik

@peter_richtarik

7 months

One of my rejected ICML 2025 (@icmlconf) papers. Can anyone spot any criticism in the metareview? What a joke 🙃

26

25

518

Csordás Róbert

@robert_csordas

1 year

Come visit our poster "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention" on Thursday at 11 am in East Exhibit Hall A-C on #NeurIPS2024. With @PiotrPiekosAI, Kazuki Irie and @SchmidhuberAI.

2

20

51