rohan @global__void X Profile

rohan

@global__void

Followers

22

Following

630

Media

11

Statuses

85

engineering && writing kernels

smem

Joined May 2025

Don't wanna be here? Send us removal request.

Greg Brockman

@gdb

24 days

inference is perhaps the most valuable emerging software category. as models get smarter and more economically valuable, compute will increasingly be spent drawing samples from the models. if you'd like to work on inference at openai, reach out — gdb@openai.com. include a

104

127

2K

LAVISH

@msalexandriae

5 months

What a privilege to be tired from the work you prayed for. What a privilege to feel overwhelmed by growth you used to dream about. What a privilege to be challenged by a life you created on purpose. What a privilege to outgrow things you used to settle for.

81

14K

51K

maharshi

@maharshii

3 months

triton, gluon, cutedsl, hopper, blackwell, tensorcores, layouts, composition, local_tile, partitionS, partitionD, wgmma, tcgen05, TMA, block scaling, coalesced access, ampere, ada lovelace, cutlass, cublas, cudnn, flash attention, gemm, sgemm, fp16, bf16, mxfp8, nvfp4, int4,

38

113

2K

rohan

@global__void

3 months

half of my model runtime is spent on sampling and the other half of them time it ooms on collab 🥀

0

1

rohan

@global__void

3 months

this makes so much sense wow

Mira

@_Mira___Mira_

3 months

Since floating point addition isn't associative, you could use it as an activation function. Especially with fp8, fp4... with fp4 I feel like you wouldn't need an activation function.

0

1

rohan

@global__void

3 months

I really do think about videos like this a lot , there's prolly so many stories between teams like this that built great products.

mitsuri

@0xmitsurii

3 months

20 years ago Youtube founders were depressed cause they only had 40 uploads.

1

0

2

rohan

@global__void

3 months

the era has just begun

alterego

@alterego_io

3 months

Introducing Alterego: the world’s first near-telepathic wearable that enables silent communication at the speed of thought. Alterego makes AI an extension of the human mind. We’ve made several breakthroughs since our work started at MIT. We’re announcing those today.

0

1

Elliot Arledge

@elliotarledge

3 months

CUDA 13.0 just dropped. I compressed their 26 page pdf into a thread:

12

123

1K

rohan

@global__void

3 months

Sisyphus had to simply enjoy the journey to break the eternal curse .

Naval

@naval

3 months

The purest reason to make something is not to make money and not even to make the thing. It’s to have the experience of making the thing - and no one can take that from you.

0

rohan

@global__void

3 months

🥹

0

1

rohan

@global__void

3 months

he also has an amazing youtube channel, one of the first ever videos I watched on training was from him:

saurabh

@saurabhtwq

3 months

damm this blog is so good.

0

Aadit Sheth

@aaditsh

4 months

This guy literally dropped the best life advice you'll ever hear

17

211

2K

rohan

@global__void

4 months

just when I think I'm beginning to understand CuTe layouts I'll see another one that shatters me

0

1

rohan

@global__void

4 months

Cuda Warps make the sweet parallelism that we love possible , but what about Divergence ? In this blog - I talk about how it has been handled over the years and If you do give it a read I would love to hear what you have to say ! https://t.co/IQf8yJalQ9

medium.com

As I’ve been learning CUDA and exploring its execution model, one concept that initially stood out was the ideal of warp-level lockstep…

0

rohan

@global__void

5 months

https://t.co/yI2gudaCg7 very nice read !

0

rohan

@global__void

5 months

cooperative groups you have rocked my world

0

rohan

@global__void

5 months

If you had access to 2 H100s what would you do ?

0

1

rohan

@global__void

5 months

been a while but we stay cooking ! ( there so much to learn ahhhhh )

0

2

rohan

@global__void

5 months

huhhh

sophia

@cis_female

5 months

> fp8 is 100 tflops faster when the kernel name has "cutlass" in it kms https://t.co/KpZjwSAkrM

0

1

rohan

@global__void

5 months

I used to think warp level execution is guaranteed to be lockstep and many sources, llms made it seem that way but this is because before volta this was the case , it no longer is guaranteed . Im enlightened but that means I have to unlearn some notions ⚰️

0

1