clbswrs Profile Banner
caleb ๐Ÿฎ Profile
caleb ๐Ÿฎ

@clbswrs

Followers
238
Following
36K
Media
103
Statuses
973

I shall be one of those who make things beautiful. art, ML, ๐Ÿ€, @pollilabs

USA
Joined April 2018
Don't wanna be here? Send us removal request.
@clbswrs
caleb ๐Ÿฎ
23 hours
RT @jd_pressman: Me last night:. "So what stands out to me about this model. Is that it doesn't do the thing language models normally do whโ€ฆ.
0
11
0
@clbswrs
caleb ๐Ÿฎ
23 hours
RT @stanfordNYC: @johncoogan Hey John, this also puts an interesting spin on California's ban on noncompetes. Traditionally, noncompetes arโ€ฆ.
0
1
0
@clbswrs
caleb ๐Ÿฎ
10 days
RT @teortaxesTex: @main_horse Yeah it's literally a V3/R1 with a slightly enlarged hidden dim, It matches in all other ways. Would be realโ€ฆ.
0
1
0
@clbswrs
caleb ๐Ÿฎ
10 days
RT @teortaxesTex: Extremely bad if true, unprecedented on this level of actors involved: Huawei covertly upcycled Qwen 2.5-14B for their Paโ€ฆ.
0
21
0
@clbswrs
caleb ๐Ÿฎ
2 months
0
0
0
@clbswrs
caleb ๐Ÿฎ
2 months
First case I have encountered where o3 one-shots a bug diagnosis where 2.5 Pro struggled. Two related issues w/ noncontiguous view aliasing and asynchronous GPU transfer memory racing; I struggled with these for a few hours. Impressed!.
1
0
1
@clbswrs
caleb ๐Ÿฎ
2 months
but whatever you do, DON'T ask for base64 png!! you will see that poor 2.5 flash spent 9mins and 64k tokens trying to return a base64 png in regular token space and feel really bad. or maybe this would work as intended if u used API key instead of ADC who knows.
0
0
0
@clbswrs
caleb ๐Ÿฎ
2 months
@OfficialLoganK is this accurate? or just some kind of failing on my part wrt understanding google cloud console?. will write up the sweeps later in the week! not completely straightforward to get the translation layer to kick in.
1
0
0
@clbswrs
caleb ๐Ÿฎ
2 months
2.5 Flash image segmentation requires you to trick a hidden paligemma tokenizer to kick in to translate segmentation tokens into base64 png mask. 1584-run grid search via google-genai python SDK reveals that this is only possible if using an AI studio API key-- not via ADC?.
1
0
0
@clbswrs
caleb ๐Ÿฎ
4 months
RT @iam_johnw: Nah tahaad pettiford is a lottery pick just off this game
0
20
0
@clbswrs
caleb ๐Ÿฎ
4 months
this shit is insane. lol
Tweet media one
0
0
1
@clbswrs
caleb ๐Ÿฎ
4 months
Huge Z believer over here.
@UsherNBA
Brett Usher
4 months
Zion Williamson 22-10-12 and the numbers donโ€™t even do it justice; he is SO good
0
0
0
@clbswrs
caleb ๐Ÿฎ
4 months
RT @charliermarsh: I built a prototype that infers the appropriate CUDA version at runtime and installs the latest compatible PyTorch versiโ€ฆ.
0
59
0
@clbswrs
caleb ๐Ÿฎ
5 months
RT @stereolabgroop: 61 shows becomes 62 with the addition of a second show at Metro Chicago on Friday, October 10th - tickets on sale now.โ€ฆ.
0
36
0
@clbswrs
caleb ๐Ÿฎ
5 months
War damn eagle ๐Ÿฆ….
0
0
0
@clbswrs
caleb ๐Ÿฎ
5 months
and what does o1-pro 'reasoning time' correlate with? almost exclusively response length. the reasoning is sprinkled into the response, nothing to indicate crazy upfront token- or latent-space scratchpad work. smells like top-k.
0
0
0
@clbswrs
caleb ๐Ÿฎ
5 months
but still, interesting to observe the failure cases-- what breaks reasoning at higher inference compute?.โ€ข 2-3 qualitatively different tasks.โ€ข reasoning-heavy outputs from other models (e.g. sonnet critical review of o1pro proposal).
1
0
0
@clbswrs
caleb ๐Ÿฎ
5 months
of course this analogy could be totally off. a good rebuttal from the DeepSeekMath paper: "RL enhances the modelโ€™s overall performance by rendering the output distribution more robust, in other words, it seems that the improvement is attributed to boosting the correct response.
1
0
0
@clbswrs
caleb ๐Ÿฎ
5 months
inherently dynamically unstable, would in fact flip over on itself without any (instruction tuning), yet more capable in certain dimensions precisely because of its instability.
1
0
0
@clbswrs
caleb ๐Ÿฎ
5 months
which is why o1 pro, even moreso than other reasoners, is kind of frustrating. very sensitive to prompting. very sensitive to initial conditions. you don't feel the strong attractors that you do with broad-and-robust sonnet; instead it feels like flying a F-22 with the flight.
1
0
0