global__void Profile Banner
rohan Profile
rohan

@global__void

Followers
22
Following
630
Media
11
Statuses
85

engineering && writing kernels

smem
Joined May 2025
Don't wanna be here? Send us removal request.
@gdb
Greg Brockman
24 days
inference is perhaps the most valuable emerging software category. as models get smarter and more economically valuable, compute will increasingly be spent drawing samples from the models. if you'd like to work on inference at openai, reach out — gdb@openai.com. include a
104
127
2K
@msalexandriae
LAVISH
5 months
What a privilege to be tired from the work you prayed for. What a privilege to feel overwhelmed by growth you used to dream about. What a privilege to be challenged by a life you created on purpose. What a privilege to outgrow things you used to settle for.
81
14K
51K
@maharshii
maharshi
3 months
triton, gluon, cutedsl, hopper, blackwell, tensorcores, layouts, composition, local_tile, partitionS, partitionD, wgmma, tcgen05, TMA, block scaling, coalesced access, ampere, ada lovelace, cutlass, cublas, cudnn, flash attention, gemm, sgemm, fp16, bf16, mxfp8, nvfp4, int4,
38
113
2K
@global__void
rohan
3 months
half of my model runtime is spent on sampling and the other half of them time it ooms on collab 🥀
0
0
1
@global__void
rohan
3 months
this makes so much sense wow
@_Mira___Mira_
Mira
3 months
Since floating point addition isn't associative, you could use it as an activation function. Especially with fp8, fp4... with fp4 I feel like you wouldn't need an activation function.
0
0
1
@global__void
rohan
3 months
I really do think about videos like this a lot , there's prolly so many stories between teams like this that built great products.
@0xmitsurii
mitsuri
3 months
20 years ago Youtube founders were depressed cause they only had 40 uploads.
1
0
2
@global__void
rohan
3 months
the era has just begun
@alterego_io
alterego
3 months
Introducing Alterego: the world’s first near-telepathic wearable that enables silent communication at the speed of thought. Alterego makes AI an extension of the human mind. We’ve made several breakthroughs since our work started at MIT. We’re announcing those today.
0
0
1
@elliotarledge
Elliot Arledge
3 months
CUDA 13.0 just dropped. I compressed their 26 page pdf into a thread:
12
123
1K
@global__void
rohan
3 months
Sisyphus had to simply enjoy the journey to break the eternal curse .
@naval
Naval
3 months
The purest reason to make something is not to make money and not even to make the thing. It’s to have the experience of making the thing - and no one can take that from you.
0
0
0
@global__void
rohan
3 months
🥹
0
0
1
@global__void
rohan
3 months
he also has an amazing youtube channel, one of the first ever videos I watched on training was from him:
@saurabhtwq
saurabh
3 months
damm this blog is so good.
0
0
0
@aaditsh
Aadit Sheth
4 months
This guy literally dropped the best life advice you'll ever hear
17
211
2K
@global__void
rohan
4 months
just when I think I'm beginning to understand CuTe layouts I'll see another one that shatters me
0
0
1
@global__void
rohan
4 months
Cuda Warps make the sweet parallelism that we love possible , but what about Divergence ? In this blog - I talk about how it has been handled over the years and If you do give it a read I would love to hear what you have to say ! https://t.co/IQf8yJalQ9
Tweet card summary image
medium.com
As I’ve been learning CUDA and exploring its execution model, one concept that initially stood out was the ideal of warp-level lockstep…
0
0
0
@global__void
rohan
5 months
https://t.co/yI2gudaCg7 very nice read !
0
0
0
@global__void
rohan
5 months
cooperative groups you have rocked my world
0
0
0
@global__void
rohan
5 months
If you had access to 2 H100s what would you do ?
0
0
1
@global__void
rohan
5 months
been a while but we stay cooking ! ( there so much to learn ahhhhh )
0
0
2
@global__void
rohan
5 months
huhhh
@cis_female
sophia
5 months
> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms https://t.co/KpZjwSAkrM
0
0
1
@global__void
rohan
5 months
I used to think warp level execution is guaranteed to be lockstep and many sources, llms made it seem that way but this is because before volta this was the case , it no longer is guaranteed . Im enlightened but that means I have to unlearn some notions ⚰️
0
0
1