Chase Brower Profile
Chase Brower

@ChaseBrowe32432

Followers
2K
Following
3K
Media
721
Statuses
4K

software dev, working on AI stuff

Joined June 2023
Don't wanna be here? Send us removal request.
@ChaseBrowe32432
Chase Brower
13 days
Gemini 3 Pro (preview) scores 91% on VPCT (spatial reasoning) Uhhhh jesus christ
67
121
2K
@ChaseBrowe32432
Chase Brower
6 days
Opus 4.5 is only on the 100$ plan... RIP
14
2
152
@ChaseBrowe32432
Chase Brower
6 days
Example problem, this benchmark tests basic visual physics reasoning Gemini 3 has ~solved this while Anthropic is still not yet in the game lol
3
0
23
@ChaseBrowe32432
Chase Brower
6 days
Claude 4.5 Opus scores 40% on VPCT (visual physics) 🗿
16
10
237
@ChaseBrowe32432
Chase Brower
8 days
You can see the redux problems (format-compatible with the existing vpct-runner) here: https://t.co/yg3n4Pgi3I
Tweet card summary image
huggingface.co
0
1
7
@ChaseBrowe32432
Chase Brower
8 days
Next up, I have been working on, and will soon release, VPCT-2. VPCT-2 will be accompanied by better tooling, better metrics, and much more difficult/diverse problems (whose "time horizons" will be quantified!)
1
0
8
@ChaseBrowe32432
Chase Brower
8 days
This is an impressive result from google. Gemini 3 Pro comes close to entirely solving this benchmark. Importantly, these problems are still very very easy, for a human. I would estimate a "time horizon" of about 3 seconds based on my sampled participants.
1
0
3
@ChaseBrowe32432
Chase Brower
8 days
Gemini 3's reasoning is generally sensible and relevant to the problem. However, it incorrectly assesses those lines on the right as extending far enough left to cause the ball to land in the 2nd bucket. Even in this failing case, the model is close to solving the problem.
1
0
3
@ChaseBrowe32432
Chase Brower
8 days
Here, the model reasons: Based on a step-by-step analysis of the physics simulation in the image:\n\n1.  **Initial Drop:** The ball starts at the top center of the simulation. Gravity will pull it straight down.\n2.  **First Obstacle:** Directly below the ball is a long, slanted
1
0
3
@ChaseBrowe32432
Chase Brower
8 days
Next, I examine one of the remaining failure cases for Gemini 3 Pro, on vpct-1 problem #44:
1
0
4
@ChaseBrowe32432
Chase Brower
8 days
This is a great result! There was no overfit, intentional or otherwise (through e.g. leaking into general internet pretrain set). Gemini 3 Pro is indeed the strong visual reasoning model that it appears.
1
0
7
@ChaseBrowe32432
Chase Brower
8 days
I tested several top models from OpenAI and Google, avg@5, (as well as a baseline GPT-4o Mini) and observe no statistically significant difference in performance for any model on the redux.
1
0
8
@ChaseBrowe32432
Chase Brower
8 days
First, I produced a new set of 100 problems, VPCT-Redux, which looks slightly different from the original. Background color + horizontal position of the ball are randomized, and the buckets are now labeled. The overall problem difficulty remains unchanged.
1
0
5
@ChaseBrowe32432
Chase Brower
8 days
VPCT-1 post-mortem! I examine the original benchmark, Gemini 3 Pro's recent score, and what this means for vision tasks. TL;DR: I observe no signs of overfit (very good!).
3
6
25
@DKokotajlo
Daniel Kokotajlo
8 days
Some people are unhappy with the AI 2027 title and our AI timelines. Let me quickly clarify: We’re not confident that: 1. AGI will happen in exactly 2027 (2027 is one of the most likely specific years though!) 2. It will take <1 yr to get from AGI to ASI 3. AGIs will definitely
121
92
1K
@prerat
prerat
10 days
never mind, "if anyone builds it, everyone dies" is a good title
@tszzl
roon
10 days
@DKokotajlo most people who hear about your idea will never read the website, never watch an interview. they will assume you are predicting AGI in 2027
5
10
502
@ChaseBrowe32432
Chase Brower
10 days
And if your default assumption is "people are so retarded that they will never actually read the blogpost"... I'd rather they misunderstand that AGI is coming in 2027 than misunderstand that it's definitely not. https://t.co/LLaYYOqtH1
@tszzl
roon
10 days
@DKokotajlo most people who hear about your idea will never read the website, never watch an interview. they will assume you are predicting AGI in 2027
1
0
45
@ChaseBrowe32432
Chase Brower
10 days
This sort of pikachu-facing over Daniel K's comment is embarrassing and obtuse. The idea that the original claim of AI 2027 was "AGI is going to happen definitely in 2027 and we'll all die if we don't do xyz" cannot possibly come from a sane reading of the blogpost. I have a
@tszzl
roon
10 days
@DKokotajlo “Our timelines were longer than 2027 when we published ai 2027” bro what
28
16
300
@ChaseBrowe32432
Chase Brower
12 days
GPT-5.1-Codex-Max (xhigh) scores 77.9% not to be confused with: -GPT-5.1-Codex-Max (high) -GPT-5.1-Codex (high) -GPT-5.1 (high) -GPT-5-Codex (high)
17
15
373
@ChaseBrowe32432
Chase Brower
13 days
Will do a post-mortem + redux + new version of benchmark soon
0
0
62