Guillermo Barbadillo @guille_bar X Profile

Guillermo Barbadillo

@guille_bar

Followers

1K

Following

601

Media

63

Statuses

494

In a quest to understand intelligence

Pamplona, Spain

Joined February 2018

Don't wanna be here? Send us removal request.

Guillermo Barbadillo

@guille_bar

2 years

Evolution of computing power over time

1

20

Guillermo Barbadillo

@guille_bar

7 days

RT @M_IsForMachine: I honestly can't believe anyone would fall for this nonsense. But if you are willing to listen for a second to a real e….

0

197

0

Grok

@grok

19 days

Blazing-fast image creation – using just your voice. Try Grok Imagine.

283

567

3K

Guillermo Barbadillo

@guille_bar

12 days

RT @petergostev: I quite like how well @arcprize shows the distribution of GPT-5 variant capabilities, from 1.5% (GPT-5 Nano, Minimal) to 6….

0

30

0

Guillermo Barbadillo

@guille_bar

16 days

Catastrophic forgetting is one of the biggest challenges in continual learning. All continual learning techniques aim to achieve the best stability-plasticity tradeoff.

arxiv.org

To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual...

0

4

Guillermo Barbadillo

@guille_bar

16 days

The last few days I’ve been reading a survey on continual learning. It’s dense but the definitions are clear, and the illustrations are very helpful. Test-time training is related to continual learning and might be combined with ideas from the paper to boost accuracy on ARC-AGI.

2

22

Guillermo Barbadillo

@guille_bar

17 days

Glad to see that these experiments confirm my hunch when I read the paper a few weeks ago.

Guillermo Barbadillo

@guille_bar

1 month

As far as I understand, this is another case of test-time training, since they use example pairs from both the training and evaluation sets. I'm not sure if the hierarchical architecture is necessary, or we could get similar results with other models.

0

2

Guillermo Barbadillo

@guille_bar

17 days

I believe the bottleneck for solving ARC AGI is not the neural network architecture, but how the model is used at test time to search, learn, or both to adapt to novel tasks.

François Chollet

@fchollet

18 days

We were able to reproduce the strong findings of the HRM paper on ARC-AGI-1. Further, we ran a series of ablation experiments to get to the bottom of what's behind it. Key findings:. 1. The HRM model architecture itself (the centerpiece of the paper) is not an important factor.

1

0

13

Guillermo Barbadillo

@guille_bar

1 month

RT @c_valenzuelab: Really nice demo of what @runwayml Aleph can do for complex changes in environments while adding accurate dynamic elemen….

0

158

0

Guillermo Barbadillo

@guille_bar

1 month

is the first team to score above 20% on the ARC25 challenge. Congratulations! We're still far from the 85% goal, but there's time left since the competition ends in November.

Aldo Podestà

@podesta_aldo

1 month

Turns out our youngest researcher was right! We crossed the 20% mark and are now at a high score of 21.67%, leading the 2025 @arcprize competition!. For context, Grok-4 is currently at 16%, Claude Opus at 8.6%, and GPT‑o3 at 6.5%…. The sky is now officially the limit.

1

5

23

Guillermo Barbadillo

@guille_bar

1 month

This issue seems solvable with RL, and maybe larger LLMs would give better results. But for now, I have to look for diversity elsewhere.

0

2

Guillermo Barbadillo

@guille_bar

1 month

I knew a similar problem existed in image generation, but I mistakenly believed current LLMs could handle it better.

Thomas Willberger

@Thomas_ensc

3 years

@rockdrigoma DallE and alike are trained with image captions, not „commands“. But it doesn‘t get the negation anyway 😄

1

0

1

Guillermo Barbadillo

@guille_bar

1 month

Surprisingly, giving LLMs previously generated code and asking for something new actually reduced diversity. In many cases, the model just copied the prompt code, even though I clearly asked for different outputs and discouraged repetition.

1

0

Guillermo Barbadillo

@guille_bar

1 month

Today, I was surprised to learn that LLMs struggle with negation. While working on program synthesis, I tried boosting prediction diversity by feeding the model previous code and asking for different outputs — but it didn't work as expected. 🧵.

2

0

8

Guillermo Barbadillo

@guille_bar

1 month

As far as I understand, this is another case of test-time training, since they use example pairs from both the training and evaluation sets. I'm not sure if the hierarchical architecture is necessary, or we could get similar results with other models.

ARC Prize

@arcprize

1 month

Impressive work by @makingAGI and team. No pre-training or CoT with material performance on ARC-AGI. > With only 27 million parameters, HRM achieves exceptional performance on complex reasoning tasks using only 1000 training samples.

0

16

Guillermo Barbadillo

@guille_bar

2 months

RT @corbtt: Big news: we've figured out how to make a *universal* reward function that lets you apply RL to any agent with:.- no labeled da….

0

126

0

Guillermo Barbadillo

@guille_bar

2 months

Nice paper that in my opinion goes in the right direction to solve ARC. It generates python code to tackle the ARC tasks and combines search and learning in a virtuous cycle. I have summarized the results in the following plot.

Pourcel Julien @ICML

@PourcelJulien

2 months

Introducing SOAR 🚀, a self-improving framework for prog synth that alternates between search and learning (accepted to #ICML!). It brings LLMs from just a few percent on ARC-AGI-1 up to 52%. We’re releasing the finetuned LLMs, a dataset of 5M generated programs and the code. 🧵

3

14

88

Guillermo Barbadillo

@guille_bar

2 months

1

5

Guillermo Barbadillo

@guille_bar

2 months

In this short video, I share key takeaways from ARC24 and how they’re shaping my approach to the ARC25 challenge.

1

11

Guillermo Barbadillo

@guille_bar

3 months

RT @OriolVinyalsML: Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't alw….

0

282

0

Guillermo Barbadillo

@guille_bar

3 months

RT @vitrupo: Anthropic co-founder Ben Mann says we'll know AI is transformative when it passes the "Economic Turing Test.". Give an AI agen….

0

86

0

Guillermo Barbadillo

@guille_bar

3 months

RT @matiass: Intento no antropomorfizar la IA, pero el otro día me hizo llorar

0

15

0