Nicholas Wilt
@CUDAHandbook
Followers
3K
Following
2K
Media
79
Statuses
2K
Nicholas Wilt was on the inception team for CUDA, wrote The CUDA Handbook, and writes at https://t.co/YkR71W07I7
Joined April 2013
My latest article is on the Curiously Recurring Template Pattern (CRTP), a C++ idiom I’m using to make it easier to explore optimized limit order book implementations. Link in reply
2
0
21
In building the driver API for CUDA, we did a similar exercise to what Lockheed-Martin did for the F-35. We used ANSI C, so banning exception handling was implicit. We did single return Most Of The Time(tm). One departure: we definitely alloc (and fragment) after initialization!
This...is Programming Like a Fighter Pilot. A single unhandled exception destroyed a $500 million rocket in seconds. The F-35 wasn't going to make the same mistake. By carefully slicing C++, engineers created one of the strictest coding standards ever written.
2
3
84
I call these “telepathy questions,” because they are testing the candidate’s ability to read the interviewer’s mind. I’ve often failed telepathy questions because I proposed a solution that was so far outside the interviewer’s expectation. Soooo many stories.
This is what a bad interviewer looks like: they want to see THE answer they thought of and reject all other ones A discussion on pros and cons of loops vs recursive calls would have been in order (My 2 cents: loops are more resource efficient + don’t risk stack overflow!)
0
0
4
with black&white TV. A triumph of applied science. So the designers of image and video codecs knew this when they designed digital equivalents; YUV formats dedicate 1/4 as much space to each of U and V, so each decoded video frame is half luminance and half chrominance.
0
0
3
There is a lot of magic hiding in our imaging technology. Early 20th Century scientists discovered that our eyes are much more sensitive to noise in intensity versus color, and color TV used this fact to implement an analog compression scheme that also was backward compatible 1/x
What you see here is a super cool and important magic number matrix you don't know about. This is the standard JPEG quantization matrix. It makes compression significantly more efficient utilizing specifics of human eyes (we see lower frequencies better).
1
0
18
I’ve done my share of death marching, and I will say people definitely need to sleep to do their best work.
Extremely bearish on xAI. Working 36h with no sleep will be worse than working 8 hours, sleeping 8 hours, then working 8 hours again. Even if you churn out a bunch of code, it will be shit and others will need to fix it for you.
0
0
44
Violating copyright by screenshotting paywalled content is not “the lord’s work.” I hope all 77k people who enjoyed this thread have subscribed to the Substack.
@nothinglikeamad Doing the lord’s work. That all of it yeah??
0
0
14
With apologies to Mark Twain, SRAM wants you to know that its death has been greatly exaggerated. New svbstack in the reply.
4
0
12
Interval training is your friend. Go to a 400m track. Warm up, then run a complete circuit around the track As Fast As You Can(tm), then walk/jog around the track to your starting point. Repeat 4-6x. Intervals are hands-down the most time-efficient way to build cardio.
engineers & founders, please share your advice for getting fit and staying fit while spending 10-12hr/day working on the computer.
3
1
47
lol The HIP ecosystem, such as it is, would like a word. @SpectralCom does it better though—no need for intermediate source files.
This isn't why. Trying to "compile" CUDA for AMD is nonsense; NVIDIA loves when people try. CUDA will never be fast on AMD (how do you compile if the shared memory / tensor cores are a different size?). It's the wrong layer to do this at.
2
1
31
Great thread. CPU overhead always has been a point of emphasis for CUDA. GPU are too big and expensive to let them just sit there, starving. Asynchronous operation is important.
Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE
1
6
109
Nancy Kress won the Hugo with a novella on this topic, Beggars In Spain. There’s a twist in the premise that I won’t spoil here, but yes, this work anticipated eugenics for the ultra wealthy. @DavidBrin debuted TASAT (There’s A Story About That) in 2017. https://t.co/EyMMiiDRql
david-brin.medium.com
Here’s a Major Announcement of a project that’s been in my thoughts for a long time. A way that you — yes you — can be part of an action…
0
0
2
It is a surprisingly uncommon perspective among hardware companies, and easy to see why. If you make $$ selling new hardware, why invest in software that makes older hardware harder to displace? But platform development requires investments that are sometime counterintuitive.
0
0
5
Does cuTile only work on Blackwell? If the answer is No, then its benefits will find their way to customers who bought Hopper and possibly even Ampere GPUs. NVIDIA has known for decades that the best tech companies have to displace their own tech, or someone else will. 2/x
1
0
12
The point about software cannot be overstated. For its entire history afaik, NVIDIA has continued to support and invest in the software for older hardware—performance optimizations as well as bug fixes. For AI workloads, that does mean performance improves for existing chips. 1/x
$NVDA basically answering Burry: “The A100s we shipped six years ago are still running at full utilization today, now powered by a much stronger software stack.”
2
1
28
A visit to this booth is like time travel to the future.
Most people come to Booth #6552 for a free Scaley plushie. Some come to see the same, untouched CUDA code running on both AMD and NVIDIA GPUs. We don't judge your priorities. Just come say hi. #SC25 #HPC #HardwareFreedom #CUDA #AMD #NVIDIA @Supercomputing
0
3
18
From Cormen et al.’s algorithms text, to Hacker’s Delight, these are some of my favorite reference works. /fin https://t.co/X5MsjWFO0K
0
0
8