caleb 🐮 @clbswrs X Profile

caleb 🐮

@clbswrs

Followers

238

Following

36K

Media

103

Statuses

973

I shall be one of those who make things beautiful. art, ML, 🏀, @pollilabs

USA

Joined April 2018

Don't wanna be here? Send us removal request.

caleb 🐮

@clbswrs

23 hours

RT @jd_pressman: Me last night:. "So what stands out to me about this model. Is that it doesn't do the thing language models normally do wh….

0

11

0

caleb 🐮

@clbswrs

23 hours

RT @stanfordNYC: @johncoogan Hey John, this also puts an interesting spin on California's ban on noncompetes. Traditionally, noncompetes ar….

0

1

0

caleb 🐮

@clbswrs

10 days

RT @teortaxesTex: @main_horse Yeah it's literally a V3/R1 with a slightly enlarged hidden dim, It matches in all other ways. Would be real….

0

1

0

caleb 🐮

@clbswrs

10 days

RT @teortaxesTex: Extremely bad if true, unprecedented on this level of actors involved: Huawei covertly upcycled Qwen 2.5-14B for their Pa….

0

21

0

caleb 🐮

@clbswrs

2 months

0

caleb 🐮

@clbswrs

2 months

First case I have encountered where o3 one-shots a bug diagnosis where 2.5 Pro struggled. Two related issues w/ noncontiguous view aliasing and asynchronous GPU transfer memory racing; I struggled with these for a few hours. Impressed!.

1

0

1

caleb 🐮

@clbswrs

2 months

but whatever you do, DON'T ask for base64 png!! you will see that poor 2.5 flash spent 9mins and 64k tokens trying to return a base64 png in regular token space and feel really bad. or maybe this would work as intended if u used API key instead of ADC who knows.

0

caleb 🐮

@clbswrs

2 months

@OfficialLoganK is this accurate? or just some kind of failing on my part wrt understanding google cloud console?. will write up the sweeps later in the week! not completely straightforward to get the translation layer to kick in.

1

0

caleb 🐮

@clbswrs

2 months

2.5 Flash image segmentation requires you to trick a hidden paligemma tokenizer to kick in to translate segmentation tokens into base64 png mask. 1584-run grid search via google-genai python SDK reveals that this is only possible if using an AI studio API key-- not via ADC?.

1

0

caleb 🐮

@clbswrs

4 months

RT @iam_johnw: Nah tahaad pettiford is a lottery pick just off this game

0

20

0

caleb 🐮

@clbswrs

4 months

this shit is insane. lol

0

1

caleb 🐮

@clbswrs

4 months

Huge Z believer over here.

Brett Usher

@UsherNBA

4 months

Zion Williamson 22-10-12 and the numbers don’t even do it justice; he is SO good

0

caleb 🐮

@clbswrs

4 months

RT @charliermarsh: I built a prototype that infers the appropriate CUDA version at runtime and installs the latest compatible PyTorch versi….

0

59

0

caleb 🐮

@clbswrs

5 months

RT @stereolabgroop: 61 shows becomes 62 with the addition of a second show at Metro Chicago on Friday, October 10th - tickets on sale now.….

0

36

0

caleb 🐮

@clbswrs

5 months

War damn eagle 🦅.

0

caleb 🐮

@clbswrs

5 months

and what does o1-pro 'reasoning time' correlate with? almost exclusively response length. the reasoning is sprinkled into the response, nothing to indicate crazy upfront token- or latent-space scratchpad work. smells like top-k.

0

caleb 🐮

@clbswrs

5 months

but still, interesting to observe the failure cases-- what breaks reasoning at higher inference compute?.• 2-3 qualitatively different tasks.• reasoning-heavy outputs from other models (e.g. sonnet critical review of o1pro proposal).

1

0

caleb 🐮

@clbswrs

5 months

of course this analogy could be totally off. a good rebuttal from the DeepSeekMath paper: "RL enhances the model’s overall performance by rendering the output distribution more robust, in other words, it seems that the improvement is attributed to boosting the correct response.

1

0

caleb 🐮

@clbswrs

5 months

inherently dynamically unstable, would in fact flip over on itself without any (instruction tuning), yet more capable in certain dimensions precisely because of its instability.

1

0

caleb 🐮

@clbswrs

5 months

which is why o1 pro, even moreso than other reasoners, is kind of frustrating. very sensitive to prompting. very sensitive to initial conditions. you don't feel the strong attractors that you do with broad-and-robust sonnet; instead it feels like flying a F-22 with the flight.

1

0