the tiny corp
@__tinygrad__
Followers
62K
Following
578
Media
461
Statuses
3K
We make tinygrad; sell tinybox for the GPU middle class. Our mission is to commoditize the petaflop.
San Diego
Joined June 2023
1
9
33
Everything at maximum! Audible but happily coexisting in our office during the long burn. 0 AERs. 0 stuttering in the PCIe. The Blackwell tinybox is solid.
3
4
172
Why are my MI350X GPUs drawing 2 kW at idle? Between the 4 machines this is costing us $3,000 per month!
28
11
443
Final temps at saturation after 15 minutes were 72C, 80C, 71C, and 76C. We're still working on the fan policy and card layout, the coolers are different from what we have worked with before. But the shipping machine will be *at least* this good.
1
0
19
Here it is in the huggingface/gpu-fryer. 2522W at full power, no Max-Q around here!
2
0
14
While we wait for the gpu-fryer, here's mmapeak. **3.1 PFLOPS** across the cards fp16 -> fp32. Here's where the lack of the 5090's nerfing really shines, it's more than double the raw FLOPS of a tinybox green v2!
1
0
11
All our Blackwell boxes will be shipping with our latest RAID array. **55.3 GB/s** of benchmarked read bandwidth, which is faster than the RAM on most cell phones.
2
0
17
The torch GEMM on a single card is 438 TFLOPS BF16 -> FP32. That puts the machine at 1.75 PFLOPS of real GEMM performance.
1
0
21
We got sick of using vendor tools for bandwidth tests, so we wrote a universal one in tinygrad. The GPUs are connected at full PCIe 5.0 x16
2
1
30
Playing with TinyGrad kernel gen on stream rn. Ilya podcast later
1
2
75
Tinygrad now does hardware video decoding on NVIDIA without an NVCUVID dependency. Only dependency is the NVIDIA driver, although originally the goal was to bypass even that. Been a long time coming, very cool. https://t.co/w6l5QJ4cmG
github.com
You like pytorch? You like micrograd? You love tinygrad! ❤️ - tinygrad/tinygrad
1
2
56
Alternatively, add a feature to output the raw SQTT stream, not something that's processed potentially incorrectly. But adding features is dev resources, open source is one button. Should we spend time on figuring this out, or on lowering Strix Halo idle power?
2
0
24
This is a weekly reminder that rocprof-trace-decoder is closed source, and pretty sure there's bugs in the the VALU stall times. We only have so many dev resources, it would be nice to not spend them rewriting this in Rust, a 4 week project...all the GPU gens are different!
1
9
119
For anything off the beaten path, tinygrad is often faster than torch!
6
6
334