Chase Brower @ChaseBrowe32432 X Profile

Chase Brower

@ChaseBrowe32432

Followers

2K

Following

3K

Media

721

Statuses

4K

software dev, working on AI stuff

https://t.co/vecNGHezcC

Joined June 2023

Don't wanna be here? Send us removal request.

Chase Brower

@ChaseBrowe32432

13 days

Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ

67

121

2K

Chase Brower

@ChaseBrowe32432

6 days

Opus 4.5 is only on the 100$ plan... RIP

14

2

152

Chase Brower

@ChaseBrowe32432

6 days

Example problem, this benchmark tests basic visual physics reasoning Gemini 3 has ~solved this while Anthropic is still not yet in the game lol

3

0

23

Chase Brower

@ChaseBrowe32432

6 days

Claude 4.5 Opus scores 40% on VPCT (visual physics) 🗿

16

10

237

Chase Brower

@ChaseBrowe32432

8 days

You can see the redux problems (format-compatible with the existing vpct-runner) here: https://t.co/yg3n4Pgi3I

huggingface.co

0

1

7

Chase Brower

@ChaseBrowe32432

8 days

Next up, I have been working on, and will soon release, VPCT-2. VPCT-2 will be accompanied by better tooling, better metrics, and much more difficult/diverse problems (whose "time horizons" will be quantified!)

1

0

8

Chase Brower

@ChaseBrowe32432

8 days

This is an impressive result from google. Gemini 3 Pro comes close to entirely solving this benchmark. Importantly, these problems are still very very easy, for a human. I would estimate a "time horizon" of about 3 seconds based on my sampled participants.

1

0

3

Chase Brower

@ChaseBrowe32432

8 days

Gemini 3's reasoning is generally sensible and relevant to the problem. However, it incorrectly assesses those lines on the right as extending far enough left to cause the ball to land in the 2nd bucket. Even in this failing case, the model is close to solving the problem.

1

0

3

Chase Brower

@ChaseBrowe32432

8 days

Here, the model reasons: Based on a step-by-step analysis of the physics simulation in the image:\n\n1. **Initial Drop:** The ball starts at the top center of the simulation. Gravity will pull it straight down.\n2. **First Obstacle:** Directly below the ball is a long, slanted

1

0

3

Chase Brower

@ChaseBrowe32432

8 days

Next, I examine one of the remaining failure cases for Gemini 3 Pro, on vpct-1 problem #44:

1

0

4

Chase Brower

@ChaseBrowe32432

8 days

This is a great result! There was no overfit, intentional or otherwise (through e.g. leaking into general internet pretrain set). Gemini 3 Pro is indeed the strong visual reasoning model that it appears.

1

0

7

Chase Brower

@ChaseBrowe32432

8 days

I tested several top models from OpenAI and Google, avg@5, (as well as a baseline GPT-4o Mini) and observe no statistically significant difference in performance for any model on the redux.

1

0

8

Chase Brower

@ChaseBrowe32432

8 days

First, I produced a new set of 100 problems, VPCT-Redux, which looks slightly different from the original. Background color + horizontal position of the ball are randomized, and the buckets are now labeled. The overall problem difficulty remains unchanged.

1

0

5

Chase Brower

@ChaseBrowe32432

8 days

VPCT-1 post-mortem! I examine the original benchmark, Gemini 3 Pro's recent score, and what this means for vision tasks. TL;DR: I observe no signs of overfit (very good!).

3

6

25

Daniel Kokotajlo

@DKokotajlo

8 days

Some people are unhappy with the AI 2027 title and our AI timelines. Let me quickly clarify: We’re not confident that: 1. AGI will happen in exactly 2027 (2027 is one of the most likely specific years though!) 2. It will take <1 yr to get from AGI to ASI 3. AGIs will definitely

121

92

1K

prerat

@prerat

10 days

never mind, "if anyone builds it, everyone dies" is a good title

roon

@tszzl

10 days

@DKokotajlo most people who hear about your idea will never read the website, never watch an interview. they will assume you are predicting AGI in 2027

5

10

502

Chase Brower

@ChaseBrowe32432

10 days

And if your default assumption is "people are so retarded that they will never actually read the blogpost"... I'd rather they misunderstand that AGI is coming in 2027 than misunderstand that it's definitely not. https://t.co/LLaYYOqtH1

roon

@tszzl

10 days

@DKokotajlo most people who hear about your idea will never read the website, never watch an interview. they will assume you are predicting AGI in 2027

1

0

45

Chase Brower

@ChaseBrowe32432

10 days

This sort of pikachu-facing over Daniel K's comment is embarrassing and obtuse. The idea that the original claim of AI 2027 was "AGI is going to happen definitely in 2027 and we'll all die if we don't do xyz" cannot possibly come from a sane reading of the blogpost. I have a