CUDAHandbook Profile Banner
Nicholas Wilt Profile
Nicholas Wilt

@CUDAHandbook

Followers
3K
Following
2K
Media
79
Statuses
2K

Nicholas Wilt was on the inception team for CUDA, wrote The CUDA Handbook, and writes at https://t.co/YkR71W07I7

Joined April 2013
Don't wanna be here? Send us removal request.
@CUDAHandbook
Nicholas Wilt
12 hours
0
0
4
@CUDAHandbook
Nicholas Wilt
12 hours
My latest article is on the Curiously Recurring Template Pattern (CRTP), a C++ idiom I’m using to make it easier to explore optimized limit order book implementations. Link in reply
2
0
21
@CUDAHandbook
Nicholas Wilt
7 days
In building the driver API for CUDA, we did a similar exercise to what Lockheed-Martin did for the F-35. We used ANSI C, so banning exception handling was implicit. We did single return Most Of The Time(tm). One departure: we definitely alloc (and fragment) after initialization!
@lauriewired
LaurieWired
9 days
This...is Programming Like a Fighter Pilot. A single unhandled exception destroyed a $500 million rocket in seconds. The F-35 wasn't going to make the same mistake. By carefully slicing C++, engineers created one of the strictest coding standards ever written.
2
3
84
@CUDAHandbook
Nicholas Wilt
9 days
I call these “telepathy questions,” because they are testing the candidate’s ability to read the interviewer’s mind. I’ve often failed telepathy questions because I proposed a solution that was so far outside the interviewer’s expectation. Soooo many stories.
@GergelyOrosz
Gergely Orosz
10 days
This is what a bad interviewer looks like: they want to see THE answer they thought of and reject all other ones A discussion on pros and cons of loops vs recursive calls would have been in order (My 2 cents: loops are more resource efficient + don’t risk stack overflow!)
0
0
4
@CUDAHandbook
Nicholas Wilt
9 days
with black&white TV. A triumph of applied science. So the designers of image and video codecs knew this when they designed digital equivalents; YUV formats dedicate 1/4 as much space to each of U and V, so each decoded video frame is half luminance and half chrominance.
0
0
3
@CUDAHandbook
Nicholas Wilt
9 days
There is a lot of magic hiding in our imaging technology. Early 20th Century scientists discovered that our eyes are much more sensitive to noise in intensity versus color, and color TV used this fact to implement an analog compression scheme that also was backward compatible 1/x
@endermanch
Enderman
10 days
What you see here is a super cool and important magic number matrix you don't know about. This is the standard JPEG quantization matrix. It makes compression significantly more efficient utilizing specifics of human eyes (we see lower frequencies better).
1
0
18
@CUDAHandbook
Nicholas Wilt
11 days
👀
@SebAaltonen
Sebastian Aaltonen
11 days
My "No Graphics API" blog post is almost ready. Going to ask some industry insiders to proofread it later this week. It's between 20-30 pages (depending on screen size). Wife (PhD lecturer/researcher) said: That's not a blog post, that's an article :)
0
0
0
@CUDAHandbook
Nicholas Wilt
14 days
I’ve done my share of death marching, and I will say people definitely need to sleep to do their best work.
@redtachyon
Ariel
15 days
Extremely bearish on xAI. Working 36h with no sleep will be worse than working 8 hours, sleeping 8 hours, then working 8 hours again. Even if you churn out a bunch of code, it will be shit and others will need to fix it for you.
0
0
44
@CUDAHandbook
Nicholas Wilt
16 days
Violating copyright by screenshotting paywalled content is not “the lord’s work.” I hope all 77k people who enjoyed this thread have subscribed to the Substack.
@justinKLOCZKO
justin kloczko
16 days
@nothinglikeamad Doing the lord’s work. That all of it yeah??
0
0
14
@CUDAHandbook
Nicholas Wilt
17 days
0
0
0
@CUDAHandbook
Nicholas Wilt
17 days
With apologies to Mark Twain, SRAM wants you to know that its death has been greatly exaggerated. New svbstack in the reply.
4
0
12
@CUDAHandbook
Nicholas Wilt
17 days
Interval training is your friend. Go to a 400m track. Warm up, then run a complete circuit around the track As Fast As You Can(tm), then walk/jog around the track to your starting point. Repeat 4-6x. Intervals are hands-down the most time-efficient way to build cardio.
@zeroxjackson
jackson ⁖
18 days
engineers & founders, please share your advice for getting fit and staying fit while spending 10-12hr/day working on the computer.
3
1
47
@CUDAHandbook
Nicholas Wilt
18 days
lol The HIP ecosystem, such as it is, would like a word. @SpectralCom does it better though—no need for intermediate source files.
@__tinygrad__
the tiny corp
18 days
This isn't why. Trying to "compile" CUDA for AMD is nonsense; NVIDIA loves when people try. CUDA will never be fast on AMD (how do you compile if the shared memory / tensor cores are a different size?). It's the wrong layer to do this at.
2
1
31
@CUDAHandbook
Nicholas Wilt
18 days
Great thread. CPU overhead always has been a point of emphasis for CUDA. GPU are too big and expensive to let them just sit there, starving. Asynchronous operation is important.
@charles_irl
Charles 🎉 Frye
24 days
Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE
1
6
109
@CUDAHandbook
Nicholas Wilt
21 days
Nancy Kress won the Hugo with a novella on this topic, Beggars In Spain. There’s a twist in the premise that I won’t spoil here, but yes, this work anticipated eugenics for the ultra wealthy. @DavidBrin debuted TASAT (There’s A Story About That) in 2017. https://t.co/EyMMiiDRql
Tweet card summary image
david-brin.medium.com
Here’s a Major Announcement of a project that’s been in my thoughts for a long time. A way that you — yes you — can be part of an action…
@simpsoka
Kath Korevec
22 days
So… eugenics is profitable now?
0
0
2
@CUDAHandbook
Nicholas Wilt
23 days
It is a surprisingly uncommon perspective among hardware companies, and easy to see why. If you make $$ selling new hardware, why invest in software that makes older hardware harder to displace? But platform development requires investments that are sometime counterintuitive.
0
0
5
@CUDAHandbook
Nicholas Wilt
23 days
Does cuTile only work on Blackwell? If the answer is No, then its benefits will find their way to customers who bought Hopper and possibly even Ampere GPUs. NVIDIA has known for decades that the best tech companies have to displace their own tech, or someone else will. 2/x
1
0
12
@CUDAHandbook
Nicholas Wilt
23 days
The point about software cannot be overstated. For its entire history afaik, NVIDIA has continued to support and invest in the software for older hardware—performance optimizations as well as bug fixes. For AI workloads, that does mean performance improves for existing chips. 1/x
@wallstengine
Wall St Engine
23 days
$NVDA basically answering Burry: “The A100s we shipped six years ago are still running at full utilization today, now powered by a much stronger software stack.”
2
1
28
@HotAisle
Hot Aisle
24 days
A visit to this booth is like time travel to the future.
@SpectralCom
Spectral Compute
24 days
Most people come to Booth #6552 for a free Scaley plushie. Some come to see the same, untouched CUDA code running on both AMD and NVIDIA GPUs. We don't judge your priorities. Just come say hi. #SC25 #HPC #HardwareFreedom #CUDA #AMD #NVIDIA @Supercomputing
0
3
18
@CUDAHandbook
Nicholas Wilt
24 days
From Cormen et al.’s algorithms text, to Hacker’s Delight, these are some of my favorite reference works. /fin https://t.co/X5MsjWFO0K
0
0
8