Nicholas Wilt @CUDAHandbook X Profile

Nicholas Wilt

@CUDAHandbook

Followers

3K

Following

2K

Media

79

Statuses

2K

Nicholas Wilt was on the inception team for CUDA, wrote The CUDA Handbook, and writes at https://t.co/YkR71W07I7

https://t.co/jqJ2GLKHGv

Joined April 2013

Don't wanna be here? Send us removal request.

Nicholas Wilt

@CUDAHandbook

12 hours

https://t.co/vSRtWAscvy

0

4

Nicholas Wilt

@CUDAHandbook

12 hours

My latest article is on the Curiously Recurring Template Pattern (CRTP), a C++ idiom I’m using to make it easier to explore optimized limit order book implementations. Link in reply

2

0

21

Nicholas Wilt

@CUDAHandbook

7 days

In building the driver API for CUDA, we did a similar exercise to what Lockheed-Martin did for the F-35. We used ANSI C, so banning exception handling was implicit. We did single return Most Of The Time(tm). One departure: we definitely alloc (and fragment) after initialization!

LaurieWired

@lauriewired

9 days

This...is Programming Like a Fighter Pilot. A single unhandled exception destroyed a $500 million rocket in seconds. The F-35 wasn't going to make the same mistake. By carefully slicing C++, engineers created one of the strictest coding standards ever written.

2

3

84

Nicholas Wilt

@CUDAHandbook

9 days

I call these “telepathy questions,” because they are testing the candidate’s ability to read the interviewer’s mind. I’ve often failed telepathy questions because I proposed a solution that was so far outside the interviewer’s expectation. Soooo many stories.

Gergely Orosz

@GergelyOrosz

10 days

This is what a bad interviewer looks like: they want to see THE answer they thought of and reject all other ones A discussion on pros and cons of loops vs recursive calls would have been in order (My 2 cents: loops are more resource efficient + don’t risk stack overflow!)

0

4

Nicholas Wilt

@CUDAHandbook

9 days

with black&white TV. A triumph of applied science. So the designers of image and video codecs knew this when they designed digital equivalents; YUV formats dedicate 1/4 as much space to each of U and V, so each decoded video frame is half luminance and half chrominance.

0

3

Nicholas Wilt

@CUDAHandbook

9 days

There is a lot of magic hiding in our imaging technology. Early 20th Century scientists discovered that our eyes are much more sensitive to noise in intensity versus color, and color TV used this fact to implement an analog compression scheme that also was backward compatible 1/x

Enderman

@endermanch

10 days

What you see here is a super cool and important magic number matrix you don't know about. This is the standard JPEG quantization matrix. It makes compression significantly more efficient utilizing specifics of human eyes (we see lower frequencies better).

1

0

18

Nicholas Wilt

@CUDAHandbook

11 days

👀

Sebastian Aaltonen

@SebAaltonen

11 days

My "No Graphics API" blog post is almost ready. Going to ask some industry insiders to proofread it later this week. It's between 20-30 pages (depending on screen size). Wife (PhD lecturer/researcher) said: That's not a blog post, that's an article :)

0

Nicholas Wilt

@CUDAHandbook

14 days

I’ve done my share of death marching, and I will say people definitely need to sleep to do their best work.

Ariel

@redtachyon

15 days

Extremely bearish on xAI. Working 36h with no sleep will be worse than working 8 hours, sleeping 8 hours, then working 8 hours again. Even if you churn out a bunch of code, it will be shit and others will need to fix it for you.

0

44

Nicholas Wilt

@CUDAHandbook

16 days

Violating copyright by screenshotting paywalled content is not “the lord’s work.” I hope all 77k people who enjoyed this thread have subscribed to the Substack.

justin kloczko

@justinKLOCZKO

16 days

@nothinglikeamad Doing the lord’s work. That all of it yeah??

0

14

Nicholas Wilt

@CUDAHandbook

17 days

https://t.co/Q7KOZaOXYm

0

Nicholas Wilt

@CUDAHandbook

17 days

With apologies to Mark Twain, SRAM wants you to know that its death has been greatly exaggerated. New svbstack in the reply.

4

0

12

Nicholas Wilt

@CUDAHandbook

17 days

Interval training is your friend. Go to a 400m track. Warm up, then run a complete circuit around the track As Fast As You Can(tm), then walk/jog around the track to your starting point. Repeat 4-6x. Intervals are hands-down the most time-efficient way to build cardio.

jackson ⁖

@zeroxjackson

18 days

engineers & founders, please share your advice for getting fit and staying fit while spending 10-12hr/day working on the computer.

3

1

47

Nicholas Wilt

@CUDAHandbook

18 days

lol The HIP ecosystem, such as it is, would like a word. @SpectralCom does it better though—no need for intermediate source files.

the tiny corp

@__tinygrad__

18 days

This isn't why. Trying to "compile" CUDA for AMD is nonsense; NVIDIA loves when people try. CUDA will never be fast on AMD (how do you compile if the shared memory / tensor cores are a different size?). It's the wrong layer to do this at.

2

1

31

Nicholas Wilt

@CUDAHandbook

18 days

Great thread. CPU overhead always has been a point of emphasis for CUDA. GPU are too big and expensive to let them just sit there, starving. Asynchronous operation is important.

Charles 🎉 Frye

@charles_irl

24 days

Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE

1

6

109

Nicholas Wilt

@CUDAHandbook

21 days

Nancy Kress won the Hugo with a novella on this topic, Beggars In Spain. There’s a twist in the premise that I won’t spoil here, but yes, this work anticipated eugenics for the ultra wealthy. @DavidBrin debuted TASAT (There’s A Story About That) in 2017. https://t.co/EyMMiiDRql

david-brin.medium.com

Here’s a Major Announcement of a project that’s been in my thoughts for a long time. A way that you — yes you — can be part of an action…

Kath Korevec

@simpsoka

22 days

So… eugenics is profitable now?

0

2

Nicholas Wilt

@CUDAHandbook

23 days

It is a surprisingly uncommon perspective among hardware companies, and easy to see why. If you make $$ selling new hardware, why invest in software that makes older hardware harder to displace? But platform development requires investments that are sometime counterintuitive.

0

5

Nicholas Wilt

@CUDAHandbook

23 days

Does cuTile only work on Blackwell? If the answer is No, then its benefits will find their way to customers who bought Hopper and possibly even Ampere GPUs. NVIDIA has known for decades that the best tech companies have to displace their own tech, or someone else will. 2/x

1

0

12

Nicholas Wilt

@CUDAHandbook

23 days

The point about software cannot be overstated. For its entire history afaik, NVIDIA has continued to support and invest in the software for older hardware—performance optimizations as well as bug fixes. For AI workloads, that does mean performance improves for existing chips. 1/x

Wall St Engine

@wallstengine

23 days

$NVDA basically answering Burry: “The A100s we shipped six years ago are still running at full utilization today, now powered by a much stronger software stack.”

2

1

28

Hot Aisle

@HotAisle

24 days

A visit to this booth is like time travel to the future.

Spectral Compute

@SpectralCom

24 days

Most people come to Booth #6552 for a free Scaley plushie. Some come to see the same, untouched CUDA code running on both AMD and NVIDIA GPUs. We don't judge your priorities. Just come say hi. #SC25 #HPC #HardwareFreedom #CUDA #AMD #NVIDIA @Supercomputing

0

3

18

Nicholas Wilt

@CUDAHandbook

24 days

From Cormen et al.’s algorithms text, to Hacker’s Delight, these are some of my favorite reference works. /fin https://t.co/X5MsjWFO0K

0

8