GPU_MODE Profile Banner
GPU MODE Profile
GPU MODE

@GPU_MODE

Followers
6K
Following
203
Media
21
Statuses
164

Your favorite GPU community

Joined September 2024
Don't wanna be here? Send us removal request.
@will_z65038
William Zhang
8 hours
Really grateful to @GPU_MODE for the opportunity to talk about my recent Tiny TPU project: 🧵 https://t.co/9hqftJY82z.
1
1
11
@a1zhang
Alex L Zhang
8 days
btw today at 3pm PST (in ~4 hours) we're having Vicki Wang from NVIDIA giving a @GPU_MODE talk on CuTe DSL, its features, and how to use the most of it if you're currently competing in the NVFP4 Blackwell competition this will be very helpful, but it's open to anyone!
3
21
203
@NVIDIAAIDev
NVIDIA AI Developer
10 days
Kernel Challenge #2 is LIVE In partnership with @GPU_MODE, we've dropped the next high-performance task: NVFP4 GEMM. Prove your kernel optimization skills and push the limits of low-precision computing. 📝 Problem: NVFP4 GEMM 🗓️ Deadline: Dec 19 🔗 Join Now:
3
17
141
@jackcookjack
Jack Cook
13 days
Training LLMs with NVFP4 is hard because FP4 has so few values that I can fit them all in this post: ±{0, 0.5, 1, 1.5, 2, 3, 4, 6}. But what if I told you that reducing this range even further could actually unlock better training + quantization performance? Introducing Four
6
38
235
@RyanNeverWrong
Ryan Rong
12 days
Had a good time competing in @GPU_MODE first hackathon for NVFP4 GEMV. This was especially fun since I'm taking @kayvonf 's CS149 this quarter, and it was nice applying some of the concepts we learned in class. I made a fork of the popcorn-cli tool to turn off the terminal user
2
5
136
@GPU_MODE
GPU MODE
13 days
▓▓▓░░░░░░░░░ 25% We just concluded the GEMV problem for the Blackwell NVFP4 competition. And we've started on a new GEMM problem. You can still sign up and be eligible for prizes per problem and the grand prize. glhf!
1
7
91
@m_sirovatka
Matej Sirovatka
15 days
After 3 weeks, we have concluded our first problem of the @GPU_MODE x @nvidia competition, NVFP4 GEMV. Thanks to everyone who has participated, we have collected over 40k submissions from >200 users. Congrats to the winners and good luck with the next problem, NVFP4 GEMM 🔥
7
15
181
@tqchenml
Tianqi Chen
17 days
CuteDSL 4.3.1 is here 🚀 Major host overhead optimization (10-40µs down to a 2µs in hot loops_, streamlined PyTorch interop (pass torch.Tensors directly, no more conversions needed) and export and use in more languages and envs. All powered by apache tvm-ffi ABI
9
63
327
@tonymongkolsmai
Tony Mongkolsmai
24 days
Today we are releasing our first public beta of Nsight Python! The goal is to simplify the life of a Python developer by proving a pythonic way to analyze your kernel code! Check it out, provide feedback! Nsight Python — nsight-python
10
49
344
@mobicham
mobicham
30 days
My Triton version for the NVFP4 gemv kernel competition @GPU_MODE 🧵 https://t.co/4u3hAFIlpS
Tweet card summary image
gist.github.com
GitHub Gist: instantly share code, notes, and snippets.
6
13
151
@GPU_MODE
GPU MODE
1 month
Saturday, November 15, at 12:00 PM PST we have a special talk lined up. Paulius Micikevicius arguably the man most responsible behind the efficiency revolution in GPUs with low bit dtypes and sparsity will be speaking. Cohosting with @cHHillee
2
14
106
@simran_s_arora
Simran Arora
1 month
AI has been built on one vendor’s stack for too long. AMD’s GPUs now offer state-of-the-art peak compute and memory bandwidth — but the lack of mature software / the “CUDA moat” keeps that power locked away. Time to break it and ride into our multi-silicon future. 🌊 It's been a
13
97
581
@GPU_MODE
GPU MODE
1 month
1,000 registrations so far!
@NVIDIAAIDev
NVIDIA AI Developer
1 month
Ready, Set, Go! 🏎️ Create something amazing at our Blackwell NVFP4 Kernel Hackathon with @GPU_MODE. 🎊 🏆 Compete in a 4-part performance challenge to optimize low-level kernels on NVIDIA Blackwell hardware. 🥇 3 winners per challenge will receive top-tier NVIDIA hardware.
1
9
171
@j4orz
j4orz
1 month
updates to https://t.co/asKuaIFf5P. working on the runtime and eager kernels now. picograd is taking longer than other "hobby" autograds i've seen. but our plan is to be the *definitive* resource on building your own pytorch. we agree with @karpathy that course building is a
0
2
23
@GPU_MODE
GPU MODE
1 month
The most intuitive explanation of floats I've ever come across, courtesy of @fabynou https://t.co/XNiZNZTNlf
20
188
2K
@GPU_MODE
GPU MODE
1 month
Congrats again to the winners of the GPU MODE IRL #2 hackathon held in the beautiful @Accel office A giant thank you to the only neocloud that could pull off a Blackwell hackathon for us @nebiusai 1st Place – Symmetric Minds: Enabled multi-GPU expert-parallel MoE inference in
2
9
129
@charles_irl
Charles 🎉 Frye
1 month
Surprising properties of low-precision floating point numbers are in the news again! These numerical formats are ubiquitous in large NNs but new to most programmers. So I worked with @klyap_ last week to put together this little visualizer: https://t.co/ntHlazyDmK.
8
23
204
@haoailab
Hao AI Lab
1 month
🔥 New Blog: “Disaggregated Inference: 18 Months Later” 18 months in LLM inference feels like a new Moore’s Law cycle – but this time not just 2x per year: 💸 Serving cost ↓10–100x 🚀 Throughput ↑10x ⚡ Latency ↓5x A big reason? Disaggregated Inference. From DistServe, our
Tweet card summary image
hao-ai-lab.github.io
Eighteen months ago, our lab introduced DistServe with a simple bet: split LLM inference into prefill and decode, and scale them independently on separate compute pools. Today, almost every product...
7
48
175
@code_star
Cody Blakeney
1 month
If you are a student and you want a career follow @GPU_MODE, attend events, and do the completions. You WILL be employable.
@m_sirovatka
Matej Sirovatka
1 month
It's that time of the year again and we're coming with another @GPU_MODE competition! This time in collaboration with @nvidia focused on NVFP4. Focused on NVFP4 and B200 GPUs (thanks to @sestercegroup ) we'll release 4 problems over the following 3 months: 1. NVFP4 Batched GEMV
5
6
198