cloud11665 Profile Banner
cloud Profile
cloud

@cloud11665

Followers
11K
Following
12K
Media
435
Statuses
3K

SIMD enjoyer, tensor rotator, LLM inference optimizoor | Technical Staff @ https://t.co/gQXVxhjcOm

SF ↔️ Tokyo ↔️ Poland
Joined July 2017
Don't wanna be here? Send us removal request.
@cloud11665
cloud
21 hours
Using o3 for math was the first time an LLM humbled me. I spent nearly 2 hours on a complex combinatoric counting problem and it one-shotted it and then double checked it in python using a Jupyter notebook in its chain of thought….
1
0
29
@cloud11665
cloud
2 days
🤨
Tweet media one
2
0
8
@cloud11665
cloud
3 days
Should I stay anon or should I pivot to a face account?.
5
0
10
@cloud11665
cloud
3 days
I feel like pipeline parallel is somehow more ensouled than tensor parallel despite being slower. Sharding must feel very weird for the gpu. But then again with PP you have bubbles where nothing ever happens - Weird!.
0
0
8
@cloud11665
cloud
3 days
Tweet media one
1
2
24
@cloud11665
cloud
4 days
RT @tenderizzation: this has been a longstanding issue, and predates the dawn of neoclouds . one solution is to literally have all your nod….
0
2
0
@cloud11665
cloud
4 days
As ICML 2025 is approaching, it's time to reheat this banger
8
12
165
@cloud11665
cloud
9 days
The more you buy the more you save.
0
0
14
@cloud11665
cloud
9 days
My Dyson fan is doing a bad job at being a fan. Sad. I have to get a standard box fan now to route the cold air from the room I installed the AC unit in.
2
0
14
@cloud11665
cloud
10 days
A very good talk by @foonathan .tldr: Differences in branch prediction optimization are very architecture + platform dependent .
1
5
18
@cloud11665
cloud
10 days
RT @tszzl: infinity is poison, scale is inhuman. you worship coldness having never known warmth.
0
34
0
@cloud11665
cloud
11 days
If she doesn't like your mvp she will not like your end product.
0
0
10
@cloud11665
cloud
13 days
Day 1 after returning to Europe from SF I bought and installed an AC. Holy Fuck. I've been coping for so many years.
10
0
105
@cloud11665
cloud
13 days
unexpected but appreciated SemiAnalysis x Miffy collab
Tweet media one
@SemiAnalysis_
SemiAnalysis
13 days
NVIDIA Tensor Core Evolution.From Volta To Blackwell.Amdahl’s Law, Strong Scaling.Asynchronous Execution.Blackwell, Hopper, Ampere, Turing, Volta.
0
0
23
@cloud11665
cloud
13 days
I miss this city already
Tweet media one
18
2
154
@cloud11665
cloud
15 days
I’ll continue writing tests and I’ll scale it up to 4K numbers also in int16 and int32.
0
0
10
@cloud11665
cloud
15 days
uh. There is no way it will continue scaling like that. right?.512 elements is 8 out of 16 avx512 registers - next stop is 1024 elements but then still there is much more unnamed registers and the cpu is fantastic at register renaming
Tweet media one
@cloud11665
cloud
17 days
Did you know that it is possible to sort 64 uint8 numbers in less than 64 instructions?.I am playing around with SIMD (avx512) bitonic sorting networks
Tweet media one
4
2
35
@cloud11665
cloud
16 days
The mi355x gets 10PFLOPS in fp6, that’s 10,000,000,000,000,000 operations in a second
Tweet media one
4
4
71
@cloud11665
cloud
17 days
I expect the simd version to plateau in the 128 / 256 item range unless I get blessed by the pipeline and register renaming cpu Gods - which is quite possible since it's all branchless.
0
0
18
@cloud11665
cloud
17 days
Did you know that it is possible to sort 64 uint8 numbers in less than 64 instructions?.I am playing around with SIMD (avx512) bitonic sorting networks
Tweet media one
15
11
415