
Vijay
@__tensorcore__
Followers
2K
Following
8K
Media
59
Statuses
1K
MLIR, CUTLASS,Tensor Core arch @NVIDIA. Mechanic @hpcgarage. Exercise of any 1st amendment rights are for none other than myself.
Joined July 2015
Another 🔥 blog about CUTLASS from @colfaxintl, this time focusing on the gory details of block-scaled MXFP and NVFP data types and Blackwell kernels for them.
0
36
158
RT @tri_dao: We've been thinking about what the "ideal" architecture should look like in the era where inference is driving AI progress. GT….
0
56
0
RT @__tensorcore__: 🚨🔥 CUTLASS 4.0 is released 🔥🚨. pip install nvidia-cutlass-dsl. 4.0 marks a major shift for CUTLASS: towards native GPU….
0
82
0
RT @jinaycodes: Introducing soarXiv ✈️, the most beautiful way to explore human knowledge. Take any paper's URL and replace arxiv with soar….
0
1K
0
RT @elliotarledge: timelapse #58 (14.5 hrs): .- used cutlass python DSL to increase elementwise add/mul memory throughput (from pytorch 500….
0
3
0
RT @tri_dao: I love Cutlass, and this new Python DSL looks very well-designed. Will for sure accelerate kernel dev + exploring new ideas in….
0
25
0
RT @__tensorcore__: We believe low level access to hardware is extremely important. High level generators rob away the freedom of programme….
0
2
0
RT @memorypaladin: Most exciting addition in CUDA 12.9 for me is CUDA_LOG_FILE. You can finally get error strings to describe the error you….
0
1
0