
Guillermo Barbadillo
@guille_bar
Followers
1K
Following
559
Media
58
Statuses
472
In a quest to understand intelligence Hablando de IA en español en la TERTULia: https://t.co/SCEoGWzBYd
Pamplona, Spain
Joined February 2018
RT @OriolVinyalsML: Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't alw….
0
270
0
RT @vitrupo: Anthropic co-founder Ben Mann says we'll know AI is transformative when it passes the "Economic Turing Test.". Give an AI agen….
0
89
0
ARC-AGI-3 will be interactive and similar in spirit to Animal-AI Olympics by Matthew Crosby.
Interactive Reasoning Benchmarks are the next step in frontier evaluations. Hear @GregKamradt share why measuring human-like intelligence requires multi-turn environments. Including a sneak peak of ARC-AGI-3. Want to help us build interactive evaluations? We're hiring
1
2
27
RT @Kyle_L_Wiggers: Google quietly released an app that lets you download and run AI models locally
0
4
0
RT @MetaPuppet: This is Plastic. Made with Veo3. Spoilers in the next post. Watch before reading
0
537
0
LLMs, RL, and rockets! 🚀 Cool paper showing how test-time reinforcement learning can optimize engineering problems when a continuous reward signal is available.
🚀 New paper: LLMs for Engineering: Teaching Models to Design High-Powered Rockets 🚀. We built an environment to allow models to build high powered rockets and show by using RL models can surpass human designs!
0
0
10
After a month of competition, no team is on track to reach the 85% needed to win the ARC Grand Prize through linear progress. New ideas are needed to drive breakthroughs and reach the grand prize this year. @arcprize
11
35
228
The released version of o3 scores just 3% on ARC-AGI-2. Adaptation to novelty is still an unsolved problem in AI (and intelligence is all about adaptation to novelty).
o3 and o4-mini on ARC-AGI's Semi Private Evaluation. * o3-medium scores 53% on ARC-AGI-1.* o4-mini shows state-of-the-art efficiency.* ARC-AGI-2 remains virtually unsolved (<3%). Through analysis we highlight differences from o3-preview and other model behavior
17
20
183
RT @interconnectsai: OpenAI's o3: Over-optimization is back and weirder than ever.Tools, true rewards, and a new direction for language mod….
0
14
0
RT @TheAhmadOsman: Microsoft just released the first natively trained 1-bit model: BitNet 2B. Trained on 4 Trillion tokens. Native 1.58-bi….
0
154
0
RT @PJaccetturo: What if Studio Ghibli directed Lord of the Rings?. I spent $250 in Kling credits and 9 hours re-editing the Fellowship tra….
0
13K
0