2wl Profile Banner
2wl Profile
2wl

@2wlearning

Followers
1,132
Following
403
Media
145
Statuses
393

Documenting my progress learning ML every day. 2 more weeks

residual stream
Joined September 2023
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@2wlearning
2wl
2 months
Current roadmap 3 Month Goal: Beat the strongest zero-shot submission in the AI Mathematical Olympiad () 4 Month Goal: Solve my first tinygrad bounty. 6 Month Goal: Implement LLM pre-training and inference from scratch (CUDA)
13
2
180
@2wlearning
2wl
1 month
This is your counterparty when you open robinhood. They reverse engineered binary register mappings to matmul 15% faster. gg
Tweet media one
8
34
533
@2wlearning
2wl
1 month
And they knew all this in 2018! 2018!!!
Tweet media one
5
11
297
@2wlearning
2wl
1 month
i thought it was snake oil self-help motivational bs, but is being cracked really all mindset? i feel like i got a little glimpse today. it really has nothing to do with knowledge or ability? bro why doesn't everyone do this
19
3
227
@2wlearning
2wl
2 months
Another day, another layernorm paper. This time on the effect of projecting the transformer key vectors onto a hyperplane.
Tweet media one
3
8
149
@2wlearning
2wl
2 months
arXiv: Long-context LLMs Struggle with Long In-context Learning "So the context extension papers started using subprime retrieval evaluations. Gotta keep that citation ring churning. Whenever you hear 'Needle In A Haystack', think 'Shit.'"
Tweet media one
7
7
119
@2wlearning
2wl
2 months
You chose the other activation functions because they're curvier didn't you? You animal. Look at how soulful ReLU is.
Tweet media one
4
11
111
@2wlearning
2wl
1 month
Day 28 progress report Almost 1 month. I need to go faster than this. Distracted with twitter. Need focus + grit. I understand RoPE now. kipply's blog on compute/memory bound transformer math was good. Played around with tinygrad and peeked at the code. A little vim practice.
Tweet media one
7
3
113
@2wlearning
2wl
1 month
You're given one shot to stand on the stage of human history, and your grand plan is to add a 'chat with gpt' feature to the customer support page?
9
6
108
@2wlearning
2wl
1 month
i realized today that the anthropic's claude is named after claude shannon ... ... was this something that everyone just knew?
16
2
95
@2wlearning
2wl
1 month
always reach for the lowest layer of the stack that you can, that way you only get filtered by your own skill issues, rather than someone else's skill issues.
2
4
81
@2wlearning
2wl
1 month
Day 29 progress report (moe edition) Made my own scuffed version of llama3 but with default nn.MultiheadAttention 🤡 and RoPE from lucidrains. I'll fix it and get model loading working (random params rn). Also SMoE learning today! Forensically analyzed hf's impl of mixtral lol
Tweet media one
2
0
69
@2wlearning
2wl
1 month
Day 30 progress report 𝕬𝖈𝖙𝖎𝖛𝖆𝖙𝖊 𝕭𝖊𝖆𝖘𝖙 𝕸𝖔𝖉𝖊. Monstrous progress today: insane reading session diving deep into math and papers. Matrix calculus & momentum SGD numerical analysis alone was an 8 hour brutal death march that pushed me way outside of my comfort zone.
Tweet media one
3
1
64
@2wlearning
2wl
1 month
NEVER SETTLE. BURN YOUR SHIPS. MY BACKUP PLAN? IT'S THE STREETS. 35 BLACK, WE'RE GOING ALL IN.
6
1
59
@2wlearning
2wl
1 month
i blinked and lost 4 hours to twitter ._. ... ... ... idk i think my goal of trying to follow every anime pfp was too ambitious
9
1
57
@2wlearning
2wl
2 months
Hey, real talk for a sec. I honestly felt in such a bad place so many times yesterday and today. I completely choked facing a *real* problem, and fell into self-doubt, trivializing & dismissing my learning. Thank you so so much for supporting me & keeping me accountable I mean it
Tweet media one
11
0
52
@2wlearning
2wl
1 month
Tweet media one
@nfloat16
.
2 months
activation functions drama from:
Tweet media one
1
1
28
1
0
51
@2wlearning
2wl
2 months
Day 17 progress report Absolutely massive progress today. Marathoned the entire fastai course (part 1) in a single day (2x speed + skipping stuff I knew). Planning to speedrun the book tomorrow. Hopefully finish replicating Karpathy's gpt2 notebook too? 🔥A C C E L E R A T E🔥
4
1
51
@2wlearning
2wl
2 months
I went from spending too much time watching anime to learn ML, to spending too much time learning ML to watch anime. Rewarded myself with some Sentai Daishikkaku after my big push yesterday. Lowkey 🔥
Tweet media one
3
0
51
@2wlearning
2wl
2 months
Day 24 progress report Massive push today, implemented the entire forward pass of GPT-2 by studying PyTorch like crazy and consolidating my knowledge. I REFUSE TO BACK DOWN. Activating anime protagonist powers tomorrow, the rest of GPT-2 is coming no matter what. WE WIN THESE.
Tweet media one
3
0
51
@2wlearning
2wl
2 months
Day 25 progress report Today, I finished GPT-2. It's over. やった!!! 🎉 (;´༎ຶٹ༎ຶ`) 🎉
5
0
50
@2wlearning
2wl
1 month
how does one get good at vim, like so good you still want to use it even when you're not in a terminal.
21
1
47
@2wlearning
2wl
2 months
When you're paying thousands of dollars per semester to learn about Java design patterns and UML.
Tweet media one
4
1
45
@2wlearning
2wl
2 months
Be like @ChinmayKak . If you're curious about something or want to know what resources I used you can just dm me or reply on any of my tweets. I don't really mind.
Tweet media one
3
0
47
@2wlearning
2wl
2 months
I could've saved hours of struggling if I had seen The Illustrated Transformer. bruh.
2
2
43
@2wlearning
2wl
1 month
lol
Tweet media one
6
0
43
@2wlearning
2wl
2 months
Day 23 progress report Doing too much theory really atrophies practical skills. Hitting resistance with GPT-2, mostly with this escapist/anxious mentality that's weighed me down in the past. Even tho I know everything needed, I got a headache and started doubting myself. (1/2)
Tweet media one
6
0
38
@2wlearning
2wl
1 month
Day 31 progress report More progress in the fundamentals: Bayesian inference, entropy & KL divergence, the rest of the statistics notebooks (remember it took two weeks for the first half, and now the rest in 1 day, albeit skimming). Hoping to completely fill all knowledge gaps.
Tweet media one
3
1
38
@2wlearning
2wl
1 month
Day 34 progress report A little bit of learning about GANs and some more PyTorch, rummaged the reference library for textbooks but they were outdated. Watching MIT OCW videos to try and cram DSA (skipped every class to study ML). Down in the dumps, but I live to fight another day
2
1
37
@2wlearning
2wl
1 month
Day 32 progress report Learning more about Kullback-Leibler and Jensen-Shannon divergence. Finished parts of the fastai book I skipped (100% done). Another irl deep learning reading group session on linear transformer convergence (interesting but I missed most bc train delay).
Tweet media one
1
1
37
@2wlearning
2wl
2 months
REJECT THE ESCAPIST MINDSET. REJECT THE COMFORTABLE MENTALITY. fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear. IT CANNOT DEFINE ME.
2
2
37
@2wlearning
2wl
2 months
You know what? I'm so tired of it. Of everything being fake: toy projects that sit abandoned on github because it didn't solve a real problem, ideas that never get followed through, hype vaporware that never materializes. I WANT TO BUILD SOMETHING REAL. I WANT TO WIN. NOTICE ME.
Tweet media one
2
0
35
@2wlearning
2wl
1 month
Day 35 progress report In a better spot compared to yesterday. I calmed down a little and focused on my strategy. Up to lesson 11 of fastai part 2, I'll pause it here. I cleared all my bookmarks and moved to Zotero. Levelled up my understanding. Tomorrow, I'll finish llama 3.
2
1
34
@2wlearning
2wl
28 days
Day 45 progress report Reading on audio spectrogram transformers today. Pretty surprising that this approach works as well as it does.
5
1
33
@2wlearning
2wl
1 month
Day 36 progress report Model loading seems within reach, but I want to implement the modules myself rather than reusing. It's a bit embarrassing to share this side of me; I've been kinda pathetic today. I watched lots of anime instead of working. Yet I haven't lost hope. No, I
Tweet media one
2
1
30
@2wlearning
2wl
1 month
Day 39 progress report A little more today. I can see the light at the top of the hole, I'm going to climb out. That's my promise. fastai lesson 12 to 14 seemed to be mostly revision of stuff I already learned.
Tweet media one
2
2
31
@2wlearning
2wl
1 month
Day 33 progress report Another round of einops/pytorch practice because I still struggle with broadcasting. Learned about rejection sampling, inverse transform sampling (took an embarassingly long time), monte carlo integration. Cleared out other stuff from my reading list.
Tweet media one
2
1
28
@2wlearning
2wl
1 month
frens
Tweet media one
6
0
30
@2wlearning
2wl
2 months
Day 11 A bit more statistics today - reading up on collinearity and PCA. Turns out the orthonormalization stuff from school actually had a use 🤯 My messed up sleep schedule is finally catching up to me, going to start improving my lifestyle habits so I can learn and focus more
0
0
28
@2wlearning
2wl
1 month
Day 38 progress report A tiny step again with the code, day after day little after little. Despair can't keep me down, the weight of multiple conflicting tasks can't keep me down. Tomorrow is going to be a good day, I can feel it.
3
1
30
@2wlearning
2wl
1 month
OKAY SINCE IT'S ALMOST BEEN 1 MONTH... I am now committing to only use twitter during the time I post my daily progress updates, and for at most 20mins. Please hold me accountable. Thank you :)
2
0
30
@2wlearning
2wl
1 month
How to get me to follow you Step 1. anime pfp
5
1
29
@2wlearning
2wl
2 months
Layernorm bending polytope boundaries in 3D space. Same kinda vibe as those 'invert a sphere' topology problems.
2
0
29
@2wlearning
2wl
2 months
I realized something far too late. You have to choose authenticity over fitting in. Even if people cringe at you or feel disappointed by your true self. You can't hide behind being 'quiet' to avoid criticism of your real interests. You'll never find others sharing your values.
4
1
26
@2wlearning
2wl
2 months
Day 21 progress report I fully understand layernorm both algebraically and geometrically now. Managed to graph the 2d activation plane under bias transform with GeoGebra. Feels euphoric. I won. I got sidetracked theorymaxxing on residual superposition so GPT-2 is tomorrow now.
Tweet media one
1
0
28
@2wlearning
2wl
2 months
Day 22 progress report Ok, I jinxed GPT-2 yesterday and proceeded to fall into another massive theory rabbithole. Today I read abt the polytope lens for the ReLU activation function, as well as NNs folding in n-dimensional space. I started reading on DNN tropical geometry (wip)
Tweet media one
0
0
26
@2wlearning
2wl
1 month
When you flinch in the face of adversity, you decide your fate before the battle even begins. You have to crawl through the depths of hell without hesitating a second, throwing yourself through the fire over and over. It's the only way to get stronger.
2
2
27
@2wlearning
2wl
28 days
Day 44 progress report I did a bit of coding today, not as much as I would have liked but my poor sleep schedule has really caught up to me and I have to give it my all for the upcoming assignments + exams brutal combo...
Tweet media one
1
1
25
@2wlearning
2wl
2 months
Day 9 7 hours of effort on surpassing the SOTA for LLM competitive math 🔥🔥🔥. Switched llama-cpp-python for using llama.cpp directly, still kinda bad but I have ideas (C++ arc soon!) My train got cancelled (someone jumped on the tracks) so no statistics today :(
Tweet media one
2
0
25
@2wlearning
2wl
1 month
Day 41 progress report More vision transformer reading today. The win-win paper is pretty cool, idk why but i love stupid tricks like this.
Tweet media one
1
1
25
@2wlearning
2wl
2 months
Day 26 progress report Chill day, was mostly resting but did a little PyTorch practice and read about norm growth in transformer residual streams (pretty interesting) Also spent time contemplating next steps: I'm going to start implementing papers in PyTorch & grinding quantity
Tweet media one
5
0
23
@2wlearning
2wl
29 days
Day 43 progress report Spent today working on a large-scale LLM-assisted translation project. But man, cmd-r+ really is a beast of a translation powerhouse. An open source model trading blows with GPT-4o is just insane.
Tweet media one
1
1
23
@2wlearning
2wl
1 month
Day 37 progress report A little more incremental progress today. I'll keep moving.
1
1
21
@2wlearning
2wl
1 month
Day 40 progress report Some light reading on vision transformers and sub-quadratic attention today. Took a peek at some of the unsloth code (triton) and some cutlass examples, I think I'll save it for later. So much work to multiply a matrix rip
Tweet media one
0
0
22
@2wlearning
2wl
2 months
Day 19 progress report Decided to focus entirely on transformers today. Collated tons of information from different places then mentally pieced everything together and root caused all my misunderstandings for ~6hrs, imagining all the matmuls in my head. Diving even deeper tmrw
2
0
20
@2wlearning
2wl
2 months
when you realize we're in the beta world line
Tweet media one
3
0
21
@2wlearning
2wl
1 month
youtube comments coming in clutch. way better explanation than the eleuther one
Tweet media one
1
0
21
@2wlearning
2wl
2 months
It's quite disorienting feeling a momentary sense of clarity after learning how neural networks and transformers work, only to realize that your first-order understanding is merely a 2D slice of the hyperdimensional manifold of complex emergent behaviors.
1
0
20
@2wlearning
2wl
1 month
Listen up, I have a confession to make. This may come as a shock to some of you, and no doubt some of you may feel disappointed by this, but I'm not actually captain Murrue Ramius from Gundam Seed. I know, I know. I'm sorry to everyone feeling betrayed by this revelation.
4
0
21
@2wlearning
2wl
2 months
Day 18 progress report Great progress today. Skim read about 75% of the fastai book. Lots of interesting training tidbits. Never going to blunder time checking twitter in the morning again. Plan for tomorrow: Finish the book and get started on everyone's favorite TRANSFORMERS
1
0
21
@2wlearning
2wl
2 months
Day 10 Progress: Math/LLM noob grind Accidentally wasted 1 submission because I don't know how Kaggle works. Oops. Planning on trying exllama (batched inference :o) or modifying llama.cpp next. Learned regression tests (R^2 ftw) Also, seems at least 1 or 2 people noticed me haha
Tweet media one
1
0
21
@2wlearning
2wl
1 month
Resources on divergence metrics if you're interested (quite applicable to LLMs)
1
0
19
@2wlearning
2wl
1 month
Day 42 progress report Read a bit about hyperparameter optimization today. Arguably the most important aspect of building high performance models nowadays (shockingly, perhaps even more than data depending on your perspective).
Tweet media one
0
1
19
@2wlearning
2wl
2 months
Tweet media one
1
0
20
@2wlearning
2wl
1 month
It's joever
Tweet media one
0
0
19
@2wlearning
2wl
2 months
Day 27 progress report Learning about modern transformer stuff, so RoPE, KV-cache, RMSnorm (rescales the norm without any hyperdimensional shenanigans, boo). The block diagonal part of RoPE still confuses me. I want to try implementing llama or mistral in tinygrad tomorrow.
Tweet media one
2
1
18
@2wlearning
2wl
2 months
Am I missing something? Following the formula, in the attention pattern the queries are the rows and the keys are the columns. This means the causal mask must be upper triangular. But if you watch 3b1b or Karpathy's transformer videos, the causal mask is lower triangular? wat
Tweet media one
4
0
19
@2wlearning
2wl
2 months
I don't care if it takes me 10 hours or 10 years, I will systematically pull apart every seam of neural network behavior from every perspective until the entire system fits inside my head. I will not stop. The wall will crumble.
4
1
18
@2wlearning
2wl
1 month
I WILL NEVER SURRENDER. NEVER FORGET WHO YOU ARE. DEATH CANNOT KILL ME. WE STORM THE BEACHES AT THE CRACK OF DAWN.
0
0
18
@2wlearning
2wl
2 months
Day 20 progress report Learned more about layernorm today (understand the projection now, but not 100% tbh). Yes, a day on layernorm of all things. Be fast where others are slow and slow where others are fast. Also realizing I don't know what a linear layer matrix looks like 1/2
4
0
16
@2wlearning
2wl
2 months
That exact feeling was my breaking point in the past, but I refuse to let things end here. I'm going to regroup my knowledge to solidify my understanding, clear out overdue irl backlog, grind more pytorch/einops, and start my counterattack with 110% focus. I WILL NEVER SURRENDER
Tweet media one
2
0
16
@2wlearning
2wl
2 months
Tweet media one
1
0
16
@2wlearning
2wl
1 month
believe in the future waiting for me. It doesn't matter how many times I have to pull myself together and try again, every day, slowly, meticulously, I will inch closer to the world that calls me. I know my gradients. See you downhill :)
2
0
15
@2wlearning
2wl
2 months
Hop in loser, we're going from exponential to factorial scaling laws.
Tweet media one
1
0
16
@2wlearning
2wl
2 months
Insane alpha in pandemic-era lecture recordings with a few hundred views on YouTube.
0
1
15
@2wlearning
2wl
2 months
My entire understanding of 'average' was a lie. The concept of 'mean' is really just curve fitting in disguise. Arithmetic mean fits sorted values to a line, geometric mean fits to an exponential curve, harmonic mean fits to a hyperbolic curve. You can use any differentiable func
4
0
13
@2wlearning
2wl
2 months
Can someone with a more mathematical background pls explain this paper to me: . I started losing the plot at tropical hypersurfaces. What did they mean by this?
Tweet media one
1
0
14
@2wlearning
2wl
2 months
Beware of absolute statements in ML. Even things as innocuous as "batchnorm prevents internal covariate shift" or "resnet fixes vanishing gradients". Zoom in close enough and there's a billion asterisks, and at the end of it, it's questionable if it's even true. Trust nobody.
1
0
13
@2wlearning
2wl
2 months
Tweet media one
1
0
13
@2wlearning
2wl
1 month
Also I'll stick with just PyTorch for now. Tinygrad is cool but not the thing I need at the moment. Cool resources:
0
0
14
@2wlearning
2wl
2 months
Day 27 bonus update (epistemological edition!) I've been working towards developing an intuitive understanding of modern DL as a nearish-term goal. I wrote everything so far into notes and looked for weak areas: vector calc/loss landscapes, information theory, SVD/low rank stuff
Tweet media one
2
0
13
@2wlearning
2wl
1 month
Step 2. daily learning updates (guaranteed follow tbh)
0
0
13
@2wlearning
2wl
1 month
my poor crappy laptop is fighting for its life to train mnist
3
1
11
@2wlearning
2wl
2 months
Forgot to mention yesterday, I really like this resource for visualizing all the matrix-vector multiplies in self-attention. Seeing the transforms visually is so different from just understanding matmul conceptually
0
0
12
@2wlearning
2wl
1 month
Been struggling a lot with negative self-talk that my coding speed is always going to be slow and bottleneck me. Bug after bug kills momentum. I know what I have to do, it's just painful. When practice doesn't pay off, you have to pour blood and sweat into it. Without hesitation.
2
0
12
@2wlearning
2wl
2 months
@nachoyawn many such cases
1
0
11
@2wlearning
2wl
1 month
You can't change the world without getting your hands dirty.
Tweet media one
0
1
10
@2wlearning
2wl
2 months
Cool how you can separate out prediction uncertainty as either being caused by variance in the training data for the problem vs the problem not existing in the training data at all. Paper name: Single-model Uncertainties for Deep Learning
Tweet media one
1
0
12
@2wlearning
2wl
2 months
Day 12 Progress: lvl1 frequentist noob vs lvl99 bayesian boss Tried to do more PCA today with numpy but couldn't focus. Started reading intro to Bayesian statistics. Stupid rat brain has to stop doomscrolling social media and start grinding ML textbooks.
0
0
11
@2wlearning
2wl
1 month
uhhh i think the tinygrad mnist tutorial/clang backend is bugged
1
0
11
@2wlearning
2wl
2 months
so easy to fall into the trap of complaining abt how much the world sucks. the only thing i have to consider is the agency i have over the things i control. it's the only way to change things to be better.
0
0
11
@2wlearning
2wl
2 months
Okay, as promised yesterday, I will post my (afaik novel) theory on the connection between branch specialization, neuron superposition and the vanishing gradients problem. @Final_Industry DMed me to explain it earlier, so I'll copy what I said. This thread might be long.........
1
0
11
@2wlearning
2wl
2 months
it's genuinely concerning that so many people can use LLMs and not process the full implications of what ICL means (and I know because I used to be one of those people)
@Euclaise_
Jade
2 months
This surprises me, I didn't expect this to work
7
19
189
2
0
11
@2wlearning
2wl
2 months
2014 was 10 years ago. 10 years
0
0
10
@2wlearning
2wl
1 month
I was being silly yesterday, actually I don't have to think about it so hard. Dive into theory to find ideas and directions for research, then experiment like crazy until all the ideas are exhausted and I've built some cool stuff, then repeat.
0
0
9