snowclipsed Profile Banner
snow Profile
snow

@snowclipsed

Followers
5K
Following
49K
Media
617
Statuses
10K

latent space surfer, cache-miss eliminator.

United States
Joined June 2017
Don't wanna be here? Send us removal request.
@snowclipsed
snow
7 months
My matrix multiplication (written in plain zig) achieves performance on par with Intel's MKL for my CPU! . And I will release a blog on how I got here soon :)
Tweet media one
@snowclipsed
snow
7 months
I'm publishing my high performance matmul algorithm written in Zig (matmul blog coming soon)! . This implementation is on par with (and often beats) numpy (MKL) through python on my device.
8
8
206
@snowclipsed
snow
2 hours
repost cause Brandon got autocorrected to bryan.
0
0
0
@snowclipsed
snow
2 hours
if you're into writing I highly recommend watching Brandon Sanderson's videos on yt, especially his lecture series
Tweet media one
1
0
7
@snowclipsed
snow
3 hours
@Aryvyo it as in a language model.
0
0
2
@snowclipsed
snow
3 hours
cc @Aryvyo I should make it play HSR. .
3
0
4
@snowclipsed
snow
3 hours
also in picture, the fact it says "no hard feelings" is crazy.
0
0
4
@snowclipsed
snow
3 hours
It would be really interesting to see how unavoidable is strong survival instinct without adversely affecting model performance, given resource optimization is integral to many coding related tasks. also I really wanna have a minecraft pvp environment for models.
Tweet media one
Tweet media two
@fly51fly
fly51fly
1 day
[LG] Universal Learning of Nonlinear Dynamics.E Dogariu, A Brahmbhatt, E Hazan [ New York University & Princeton University] (2025).
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
0
16
@snowclipsed
snow
8 hours
quick iteration is best iteration.
@RajaNandepu
Raja Nandepu
20 hours
sketch vs final
Tweet media one
0
0
4
@snowclipsed
snow
16 hours
RT @tensorqt: attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention a….
0
74
0
@snowclipsed
snow
18 hours
how can they go so hard.
@PrimeIntellect
Prime Intellect
18 hours
Tweet media one
Tweet media two
1
0
32
@snowclipsed
snow
1 day
after some painful rewrites for the trainer I had to go back to being correct first.
0
0
1
@snowclipsed
snow
1 day
cooking cooking cooking
Tweet media one
2
0
7
@snowclipsed
snow
1 day
the chance of an unlucky chunk crashing your hnet run is low but not zero.
@tenderizzation
tender
1 month
BPE transformer watching an H-Net output an entire wikipedia article as one chunk
Tweet media one
0
0
24
@snowclipsed
snow
1 day
also, could have a context annealing effect, temperature goes up with context length but keeps attn entropy stable ; more decisive model with more seen ctx.
0
0
2
@snowclipsed
snow
2 days
log scaling again, i presume probably for gradient stability and better interaction with the softcapping.
0
0
1
@snowclipsed
snow
2 days
scale scaling probably for long context attn? along with the attn logits being capped at 30.0
Tweet media one
@Teknium1
Teknium (e/λ)
2 days
.@xai’s Grok 2 weights have been released on @huggingface .
Tweet media one
3
0
14
@snowclipsed
snow
2 days
RT @YinpeiD: Wow 🤯 DINO v3 features aren’t just good at multi-view consistency and objectness… they even capture shadows and cloth folds! h….
0
19
0
@snowclipsed
snow
2 days
most cited data is probably bad data . never ever trust a single source of data.
0
0
3
@snowclipsed
snow
2 days
0
327
0
@snowclipsed
snow
2 days
okay CuTe DSL is actually kinda goated.
0
0
5
@snowclipsed
snow
2 days
everyone complains about CPU compilers.
@pranjalssh
Pranjal
3 days
This is funny because gpu compilers suck. No one says this about the cpu compilers.
0
0
3