
Elliot Arledge
@elliotarledge
Followers
17K
Following
14K
Media
561
Statuses
7K
21 | instructor @freecodecamp BTC: bc1qwsl2t7p75xqcs0a09hdy8hqkq05lr63hruryck ETH: 0x3b7956a6AF63eB14bB0cBf1D45FD5e9445e88558
Joined November 2022
I’m thrilled to finally have the CUDA course out! This is one of the most difficult things I've accomplished to date. Would love to hear your thoughts on the course!. Thank you for the inspiration:.@Nottlespike .@karpathy .@AdrianDittmann.
95
296
2K
timelapse #19 (13.5 hrs):.- made - diablo 4 w/ elon musk.- upgraded a discord bot.- reset and upgraded my raspberry pi
50
89
2K
timelapse #21 (12 hrs):.- leetcode practice for xAI coding test.- updated mnist-cuda (find repo in pinned) by adding a new CUDA training script w/ an extra hidden layer and a feature to show generalization.- notes on LLM training datasets, architectures, training config, etc
34
82
2K
Timelapse #3.> Slowing turning nocturnal.> Figured out a lot of issues with prev code.> getting more fluent w/ gdb.> practicing neovim motions.> improved structure in codebase (pushing to master soon).> MNIST MLP training run in raw C and python code
33
42
1K
timelapse #4.>applying intuition on 6D grid/block/threading indexing to the construction of fast matmuls.>studying techniques that get us to cuBLAS (SOTA) matmuls.>parsed ~half the CUDA C/C++ programming guide.>met with consulting client.>studied core kv cache optimization
18
53
980
Timelapse #4.> finished deep learning ecosystem lecture material (check pinned).> reproducible training runs in naive cuda kernels w/ visual eval function.> refactored > intro notes pushed to > coordinated tasks with contributors
18
49
960
timelapse #35 (17 hrs):.- trained a theoretical reasoning chatML expert for later MoE merge.- broke apart multi-head latent attention.- sorted through HQQ, HQQ+, int8 optimizer quantization.- touched up on kv-cache optimization w/ paged attention.- planned for new years party in
22
25
931
timelapse #12.> C/C++ review lectures recorded.> "gentle intro to gpus" section recorded.> major updates to cuda course github.> communicated w/ potential contributors
13
40
811
timelapse #31 (10.5 hrs):.- completed entire CUDA tensor core lecture in one go (coming soon).- picked up a new client.- further planning for my trip to SF.- studied for cmpt 200 final.- done philosophy ethics final exam today.- shovelled the driveway (off camera)
18
34
803
timelapse #9.> atomic locks + mutex locks.> nvtx profiling.> fp32/tf32 in cublas.> cublasLt vs cublas comparison.> sorted out loss function issues in C
16
23
728
for those of you scouting for "AI projects" to work on in your free time, i figure i would share the list of projects im currently doing to get a sense of how i carefully pick out problems:. 1. write a training run in CUDA for some neural net you find cool. i started off w/ a.
10
62
696
timelapse #11.> 80% of c/c++ examples and lecture notes preped.> finished cuda setup recordings (windows & linux).> didn't push hard enough today so ill have to make it up tomorrow
9
23
616
timelapse #18 (11 hrs):.- touching up course resources before upload.- switched from ubuntu to pop os.- setup everything on pop (loving it :D).- switched from notion to remnote + google keep for todo list.- studied for english test
13
19
600
timelapse #25 (14 hrs).- today was a homework day.- finished all assignments and labs for CMPT 200, PHIL 250, ENGL 104, ECON 102, PHYS 224.- chilled out w/ some diablo grinding.- self reflection to set the tone for tmrw
20
28
561
timelapse #33 (14 hrs):.- first day hacking @_TheResidency / @TheR_Labs w/ @Nottlespike, @marmikch, @Retis_Labs .- generating experts as an attempt to reproduce the magical fairy dust behind deepseek-v3 (sub 100B).- GGUF wen?.- deep dive on model merging w/ mergekit & della
20
15
463
timelapse #23 (14 hrs).- finished studying + annotating differential transformer paper.- got rid of chair to see what its like standing up (raised my desk with bricks).- xAI leetcode practice.- refreshed myself on the modern transformer architecture (way more clear than more).-
13
16
456
timelapse #8.> longest working streak this week (+4 hours off camera).> bug into pytorch internals to find the torch.matmul c++ implementation.> my cublas settings 1.08x faster than pytorch internals (will include pytorch extension as part of the course).> in depth kernel
14
13
447
timelapse #17 (20 hrs).> finished the first version of my minecraft python wrapper using forge: > now i can create training data for minecraft agents directly on my macbook from ANYWHERE and have everything automatically save to my server at home
9
23
435
timelapse #34 (18 hrs):.- pizza from scratch using deepseek-v3.- used dark magic on 20 models.- used `—commit-crimes` mergekit to combine models we probably weren’t supposed to.- shifting MoE landscape .- ran first leaderboard benchmark.- broke apart della paper for merging
14
16
422
TIMELAPSE #50 (12 hrs):.- meeting with neuralink recruiter to work on neural decoders and other BCI software problems.- break towards the end talking with Nvidia researchers over dinner.- templating and styling has been addressed and partially completed in my book.- recapping on
23
10
394
timelapse #6.> started learning about coalesced memory access.> compared cublas matmul with naive GPU, naive GPU, tiled, tiled (w/ registers).> figured out and resolved weird issue with column major and row major problem in cublas matmul.> gained intuition for mem access patterns
6
6
379
timelapse #22 (12 hrs):.- pre-screen prep (leetcode easy/medium).- tinkered w/ neural net examples using @__tinygrad__ .- studied thermodynamics.- organized a stack of AI arxiv papers
14
9
359
timelapse #30 (12.5 hrs):.- planning for my stay @_TheResidency in SF (dm if you wanna grab a coffee or just hangout between dec 26th - jan 12th).- reverse engineered the fastKAN neural net arch (may write GPU kernels for it).- studied for physics and philosophy final exams
23
14
349
I had a very intriguing conservation with @hyhieu226 at @xai this morning. Here are some things I learned about the ecosystem of GPUs and performance engineering (a thread):
7
9
319
Not including @3blue1brown as the author was really messed up of me. Sometimes I don’t think before posting, and this was a wake up call. I’ve taken the video down. I’ve always been a great admirer of your intuition and how you present it in your videos. While I’ll continue.
@elliotarledge Bro. .
16
3
318
this was the most insane week thus far (and its only thursday):.> Grok 3 from @xai .> we got automatic CUDA kernel generation from @SakanaAILabs.> ultra-scale playbook explaining all the advanced distributed training/inference solutions to make H100s sing, projected into one work
7
22
305
timelapse #53 (12.5 hrs).- learning about stupidly fast inference with SGLang.- more text-to-image finetuning for another contract.- got my whoop bracelet setup.- further tinkering and alpha version of auto generating cuda kernels to speed up pytorch operations.- approaching the
8
7
298
timelapse #28 (12 hrs):.- caught up on all uni homework.- assembled and got my new 1x3090 server running (soon to be 2x3090).- supercharged by new desk and macbook pro m4 max.- will be in san francisco on dec 27th to jan 11th (shoot me a dm if you wanna grab a coffee or hangout.
22
18
276
timelapse #40 (11 hrs):.- organized resources to make my cuda book top notch.- speedran new karpathy vid in 30 mins.- confirmed info for flying to finland where ill be shipping at @shipfr8 .- got some reps in.- speedran steam engine power in factorio with @blyzedog @CallMeOuta .-
16
5
275
timelapse #7.> sketched up intuitive diagrams for visualizing key algorithms.> built new ergonomic chair.> completely revamped all notes and removed noise (check pinned).> useful engagement in spaces with followers and engineers on tech X.> no coding today (will fix for tmrw)
14
11
239
timelapse #29 (6 hrs):.- planned entire tutorial on spinning up fast tensor core GEMM kernels - coming soon.- reversed engineered the KAN architecture and started taking apart the fastKAN (~3x faster KAN architecture) in naive pytorch - see - used my water
13
11
232
timelapse #49 (10 hrs).- built mistral-ocr for my mcp server and general cli tooling (pip install mistral-oc & mistral-ocr image.png -o output.txt).- spent today getting up to par and kickstarting my own collection of mcp servers in cursor only (perplexity, groq, emails).- lots
11
6
231
timelapse #42 (13 hrs):.- prepping minimal version of flash attn with tensor core ops.- more book structure planning.- visited aalto university.- hockey for an hour.- had to read up more on ampere vs hopper tensor core support
5
6
209
timelapse #10:.> recorded first two parts of cuda course.> configuring docker for simulated cuda setup (windows next).> banging my head against the wall trying to find 256-bit load and store instructions for vectorized mem access on ampere gpus
8
15
198
timelapse #52 (12 hrs): .> mostly contract model fine tuning today.> got bored, so played around with hyperloops in minecraft.> cleaned my room once again and it feels great.> brainstorming some ways to really get ahead of the game.> tinkered enough with tensor core hardware
17
10
197
timelapse #26 (12.5 hrs):.- complete refactor of minecraft mod (10x better performancem overall after making the switch to fabric). - got most of agent inputs working (except for mouse presses and holds).- setup secure crypto wallet.- the conversation space with @AdrianDittmann
10
8
172
timelapse #24 (12 hrs):.- self reflection in journal.- solid start on philosophy essay (university course).- added a bunch of features and cleared out clutter for minecraft agent mod.- spaces with @AdrianDittmann
12
8
156
timelapse #43:.- 10 vs 10 scrimmage at the hockey rink.- built wrapper around latex compiler for my book (then decided typst might be a better option).- started playing with gemini for the first time in a while.- getting triggered at half precision BS with tensor core flash
5
1
148
if you want to understand how my only coding bottleneck is speed of thought, dedicate 2 mins of your time to checking this out. using @GroqInc and @superwhisperapp
9
5
149
timelapse #37 (14 hrs):.> coworking @_TheResidency with @Nottlespike, @jaeyun_ha, and @heyashleybee .> got minerl working on macOS and gained a ton of dopamine from nn architecture design sprint (training minecraft AIs).> drafted tutorials for @nebiusai
8
4
126