
Mike Knoop
@mikeknoop
Followers
23K
Following
9K
Media
249
Statuses
4K
co-founder @ndea and @zapier @arcprize
sf bay area
Joined July 2009
Today we’re releasing our first public preview of ARC-AGI-3: the first three games. Version 3 is a big upgrade over v1 and v2 which are designed to challenge pure deep learning and static reasoning. In contrast, v3 challenges interactive reasoning (eg. agents). The full version
34
65
501
RT @arcprize: Hiring - ARC Prize Backend Engineer. We're opening a role to build and operate the systems behind ARC Prize's next-gen AGI be….
0
8
0
RT @tbpn: We asked @wadefoster (Co-Founder & CEO, Zapier) how AI is changing the job market. “Capitalism is largely undefeated. We can see….
0
5
0
RT @mikeknoop: @deanwball This is a contender for top charts humanity has ever produced (other top contender is GDP/capita takeoff over the….
0
7
0
i've always found prediction benchmarks interesting (eg forecastbench, prophetarena, futurex) because they structurally guarantee you can't train on test. at what accuracy% do we see AI making consistently profitable market trades?.
0
0
14
Investor’s Business Daily is highlighting the value of a diversified approach to the airline industry. While the group is up significantly since April, @IBD notes that many individual airline stocks are either extended or showing weak chart patterns, making it difficult to find
2
4
2
We just released 3 more ARC v3 games. You can play them now! These were our "private" games from the preview contest. We're sharing to give a sense of game diversity we're striving for.
ARC-AGI-3 Preview: +3 Games Released. We’ve opened 3 previously private holdout games from the Preview Agent Competition. Now 6 games are available to play online and via Agents API. Each game was selected to expand the novelty of ARC-AGI-3 public games. Can you beat them?
1
1
24
We got very useful game design feedback from this preview. We are now scaling up. Thank you to all who played the games and built agents!.
ARC-AGI-3 Preview - 30-Day Learnings. 30 days ago we released a preview of our first Interactive Reasoning Benchmark. Our goal was to ship quick, learn from the community, and inform the next >100 games. Here’s what we learned after 100s of agents and >3,900 game plays:
0
0
34
RT @fchollet: We were able to reproduce the strong findings of the HRM paper on ARC-AGI-1. Further, we ran a series of ablation experiment….
0
292
0
The HRM paper went mega viral (millions of views) based on broad ARC-AGI claims. So we went deep to verify, ablate, and analyze how much the novel "hierarchical" architecture matters. TLDR not much, but the score verified, and we found a surprising reason why. Results below.
Analyzing the Hierarchical Reasoning Model by @makingAGI. We verified scores on hidden tasks, ran ablations, and found that performance comes from an unexpected source. ARC-AGI Semi Private Scores:.* ARC-AGI-1: 32%.* ARC-AGI-2: 2%. Our 4 findings:
7
17
229
Google launched a $100k ARC-AGI "code golf" competition where the goal is to write (by hand) the shortest programs to solve each task. Starting to see some cool analysis from the community!.
🧵(1/n) Now that I’ve spent already a week on writing solvers for each of the 1st @arcprize train tasks, I’m taking a step back . Current score: ~830k (The metric is min(1, 2500-nb_byte) summed over all solvers). Sharing thoughts on induction solving and deep diving below ⬇️
5
8
135
The must-watch update! A lot has changed at frontier of AI since we launched ARC Prize -- for the better.
"I've updated my AGI timeline.". One year later, @dwarkesh_sp and @fchollet meet on camera again. Both of them have shifted their AGI timelines. They dive into AGI macroeconomics, the singularity, and ARC-AGI-3 preview.
0
1
26
We will need to teach AI the concept of task-relevant maturity.
I'm noticing that due to (I think?) a lot of benchmarkmaxxing on long horizon tasks, LLMs are becoming a little too agentic by default, a little beyond my average use case. For example in coding, the models now tend to reason for a fairly long time, they have an inclination to.
0
1
19
This thread is why YC wins. They created a scalable way to identify and fund strong "outsider" founders.
Let’s be honest. the VC/startup world is confusing AF. Most people either don’t have the incentive to tell you how it really works, or they just don’t have the energy. Luckily, I’ve got some time this morning, so here’s the unfiltered breakdown.
1
0
21