2wl @2wlearning profile

2wl

@2wlearning

Followers

1,132

Following

403

Media

145

Statuses

393

Documenting my progress learning ML every day. 2 more weeks

residual stream

Joined September 2023

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Mbappe • 479167 Tweets

Kante • 179067 Tweets

Austria • 122549 Tweets

Dudu • 114114 Tweets

Cruzeiro • 98014 Tweets

JULIANA AL 9009 • 87096 Tweets

Thuram • 83097 Tweets

Katy Perry • 75308 Tweets

#AUTFRA • 72317 Tweets

Leila • 66500 Tweets

Jack Black • 65741 Tweets

Ceci Flores • 60627 Tweets

CONGRATULATIONS JUNGKOOK • 41012 Tweets

ニンダイ • 38768 Tweets

Joey • 36216 Tweets

#WWERaw • 29976 Tweets

Dembele • 27518 Tweets

Deschamps • 26613 Tweets

Seth • 23617 Tweets

Hulk • 23382 Tweets

الزمالك • 23244 Tweets

土砂降り • 19760 Tweets

كانتي • 17396 Tweets

Rony • 17276 Tweets

Nancy Mackenzie • 14918 Tweets

EVECASTRO NA CAMA COM PITANDA • 12748 Tweets

Giroud • 12324 Tweets

Marge Simpson • 12098 Tweets

大雨の中 • 11526 Tweets

Gil Manzano • 10300 Tweets

雨やばすぎ

Aníbal Moreno

学校休み

大雨の日

Uncle Howdy

びしょびしょ

大阪北部

dr luke

Cadu

ずぶ濡れ

びしょ濡れ

雨のせい

ジョン万次郎

Kesha

にぎりの日

子持ち昆布

#DeFrenteComBlogueirinha

اولاد رزق

#ولاد_رزق_3

#ميديا_العرب

Last Seen Profiles

@Rdent2

@2coolconduit

@DanielaGresch

@otter_em

@Tinley353333

@Sheerab007

@maksudur34

@snappershaz

@Ada5664

@HeatherCrossle7

@Ducky49

@BigNarv69

@Kellylou29

@com_in_

@HDALB

@OficinaDelPresi

@ptitsdebsGO

@pinaycoder

@Blksolo1

@UmmeS14521

Pinned Tweet

2wl

@2wlearning

2 months

Current roadmap 3 Month Goal: Beat the strongest zero-shot submission in the AI Mathematical Olympiad () 4 Month Goal: Solve my first tinygrad bounty. 6 Month Goal: Implement LLM pre-training and inference from scratch (CUDA)

13

2

180

2wl

@2wlearning

1 month

Ahhh, now I understand why Citadel makes so much money

Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking

Every year, novel NVIDIA GPU designs are introduced. This rapid architectural and technological progression, coupled with a reluctance by manufacturers to disclose low-level details, makes it...

arxiv.org

12

108

1K

2wl

@2wlearning

1 month

This is your counterparty when you open robinhood. They reverse engineered binary register mappings to matmul 15% faster. gg

8

34

533

2wl

@2wlearning

1 month

And they knew all this in 2018! 2018!!!

5

11

297

2wl

@2wlearning

1 month

i thought it was snake oil self-help motivational bs, but is being cracked really all mindset? i feel like i got a little glimpse today. it really has nothing to do with knowledge or ability? bro why doesn't everyone do this

19

3

227

2wl

@2wlearning

2 months

Another day, another layernorm paper. This time on the effect of projecting the transformer key vectors onto a hyperplane.

3

8

149

2wl

@2wlearning

2 months

arXiv: Long-context LLMs Struggle with Long In-context Learning "So the context extension papers started using subprime retrieval evaluations. Gotta keep that citation ring churning. Whenever you hear 'Needle In A Haystack', think 'Shit.'"

7

119

2wl

@2wlearning

2 months

You chose the other activation functions because they're curvier didn't you? You animal. Look at how soulful ReLU is.

4

11

111

2wl

@2wlearning

1 month

Day 28 progress report Almost 1 month. I need to go faster than this. Distracted with twitter. Need focus + grit. I understand RoPE now. kipply's blog on compute/memory bound transformer math was good. Played around with tinygrad and peeked at the code. A little vim practice.

7

3

113

2wl

@2wlearning

1 month

You're given one shot to stand on the stage of human history, and your grand plan is to add a 'chat with gpt' feature to the customer support page?

9

6

108

2wl

@2wlearning

1 month

i realized today that the anthropic's claude is named after claude shannon ... ... was this something that everyone just knew?

16

2

95

2wl

@2wlearning

1 month

always reach for the lowest layer of the stack that you can, that way you only get filtered by your own skill issues, rather than someone else's skill issues.

2

4

81

2wl

@2wlearning

1 month

Day 29 progress report (moe edition) Made my own scuffed version of llama3 but with default nn.MultiheadAttention 🤡 and RoPE from lucidrains. I'll fix it and get model loading working (random params rn). Also SMoE learning today! Forensically analyzed hf's impl of mixtral lol

2

0

69

2wl

@2wlearning

1 month

Day 30 progress report 𝕬𝖈𝖙𝖎𝖛𝖆𝖙𝖊 𝕭𝖊𝖆𝖘𝖙 𝕸𝖔𝖉𝖊. Monstrous progress today: insane reading session diving deep into math and papers. Matrix calculus & momentum SGD numerical analysis alone was an 8 hour brutal death march that pushed me way outside of my comfort zone.

3

1

64

2wl

@2wlearning

1 month

NEVER SETTLE. BURN YOUR SHIPS. MY BACKUP PLAN? IT'S THE STREETS. 35 BLACK, WE'RE GOING ALL IN.

6

1

59

2wl

@2wlearning

1 month

i blinked and lost 4 hours to twitter ._. ... ... ... idk i think my goal of trying to follow every anime pfp was too ambitious

9

1

57

2wl

@2wlearning

2 months

Hey, real talk for a sec. I honestly felt in such a bad place so many times yesterday and today. I completely choked facing a *real* problem, and fell into self-doubt, trivializing & dismissing my learning. Thank you so so much for supporting me & keeping me accountable I mean it

11

0

52

2wl

@2wlearning

1 month

.

@nfloat16

2 months

activation functions drama from:

1

28

1

0

51

2wl

@2wlearning

2 months

Day 17 progress report Absolutely massive progress today. Marathoned the entire fastai course (part 1) in a single day (2x speed + skipping stuff I knew). Planning to speedrun the book tomorrow. Hopefully finish replicating Karpathy's gpt2 notebook too? 🔥A C C E L E R A T E🔥

4

1

51

2wl

@2wlearning

2 months

I went from spending too much time watching anime to learn ML, to spending too much time learning ML to watch anime. Rewarded myself with some Sentai Daishikkaku after my big push yesterday. Lowkey 🔥

3

0

51

2wl

@2wlearning

2 months

Day 24 progress report Massive push today, implemented the entire forward pass of GPT-2 by studying PyTorch like crazy and consolidating my knowledge. I REFUSE TO BACK DOWN. Activating anime protagonist powers tomorrow, the rest of GPT-2 is coming no matter what. WE WIN THESE.

3

0

51

2wl

@2wlearning

2 months

Day 25 progress report Today, I finished GPT-2. It's over. やった！！！ 🎉 (;´༎ຶٹ༎ຶ`) 🎉

5

0

50

2wl

@2wlearning

1 month

how does one get good at vim, like so good you still want to use it even when you're not in a terminal.

21

1

47

2wl

@2wlearning

2 months

When you're paying thousands of dollars per semester to learn about Java design patterns and UML.

4

1

45

2wl

@2wlearning

2 months

Be like @ChinmayKak . If you're curious about something or want to know what resources I used you can just dm me or reply on any of my tweets. I don't really mind.

3

0

47

2wl

@2wlearning

2 months

I could've saved hours of struggling if I had seen The Illustrated Transformer. bruh.

2

43

2wl

@2wlearning

1 month

lol

6

0

43

2wl

@2wlearning

1 month

highlights from today

The Matrix Calculus You Need For Deep Learning

Most of us last saw calculus in school, but derivatives are a critical part of machine learning, particularly deep neural networks, which are trained by optimizing a loss function. This article is an...

explained.ai

1

2

40

2wl

@2wlearning

2 months

Day 23 progress report Doing too much theory really atrophies practical skills. Hitting resistance with GPT-2, mostly with this escapist/anxious mentality that's weighed me down in the past. Even tho I know everything needed, I got a headache and started doubting myself. (1/2)

6

0

38

2wl

@2wlearning

1 month

Day 31 progress report More progress in the fundamentals: Bayesian inference, entropy & KL divergence, the rest of the statistics notebooks (remember it took two weeks for the first half, and now the rest in 1 day, albeit skimming). Hoping to completely fill all knowledge gaps.

3

1

38

2wl

@2wlearning

1 month

Day 34 progress report A little bit of learning about GANs and some more PyTorch, rummaged the reference library for textbooks but they were outdated. Watching MIT OCW videos to try and cram DSA (skipped every class to study ML). Down in the dumps, but I live to fight another day

2

1

37

2wl

@2wlearning

1 month

Day 32 progress report Learning more about Kullback-Leibler and Jensen-Shannon divergence. Finished parts of the fastai book I skipped (100% done). Another irl deep learning reading group session on linear transformer convergence (interesting but I missed most bc train delay).

1

37

2wl

@2wlearning

2 months

REJECT THE ESCAPIST MINDSET. REJECT THE COMFORTABLE MENTALITY. fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear fear. IT CANNOT DEFINE ME.

2

37

2wl

@2wlearning

2 months

You know what? I'm so tired of it. Of everything being fake: toy projects that sit abandoned on github because it didn't solve a real problem, ideas that never get followed through, hype vaporware that never materializes. I WANT TO BUILD SOMETHING REAL. I WANT TO WIN. NOTICE ME.

2

0

35

2wl

@2wlearning

1 month

Day 35 progress report In a better spot compared to yesterday. I calmed down a little and focused on my strategy. Up to lesson 11 of fastai part 2, I'll pause it here. I cleared all my bookmarks and moved to Zotero. Levelled up my understanding. Tomorrow, I'll finish llama 3.

2

1

34

2wl

@2wlearning

28 days

Day 45 progress report Reading on audio spectrogram transformers today. Pretty surprising that this approach works as well as it does.

5

1

33

2wl

@2wlearning

1 month

Day 36 progress report Model loading seems within reach, but I want to implement the modules myself rather than reusing. It's a bit embarrassing to share this side of me; I've been kinda pathetic today. I watched lots of anime instead of working. Yet I haven't lost hope. No, I

2

1

30

2wl

@2wlearning

1 month

Day 39 progress report A little more today. I can see the light at the top of the hole, I'm going to climb out. That's my promise. fastai lesson 12 to 14 seemed to be mostly revision of stuff I already learned.

2

31

2wl

@2wlearning

1 month

Day 33 progress report Another round of einops/pytorch practice because I still struggle with broadcasting. Learned about rejection sampling, inverse transform sampling (took an embarassingly long time), monte carlo integration. Cleared out other stuff from my reading list.

2

1

28

2wl

@2wlearning

1 month

frens

6

0

30

2wl

@2wlearning

2 months

Day 11 A bit more statistics today - reading up on collinearity and PCA. Turns out the orthonormalization stuff from school actually had a use 🤯 My messed up sleep schedule is finally catching up to me, going to start improving my lifestyle habits so I can learn and focus more

0

28

2wl

@2wlearning

1 month

Day 38 progress report A tiny step again with the code, day after day little after little. Despair can't keep me down, the weight of multiple conflicting tasks can't keep me down. Tomorrow is going to be a good day, I can feel it.

3

1

30

2wl

@2wlearning

1 month

ＯＫＡＹ　ＳＩＮＣＥ　ＩＴ＇Ｓ　ＡＬＭＯＳＴ　ＢＥＥＮ　１　ＭＯＮＴＨ．．． I am now committing to only use twitter during the time I post my daily progress updates, and for at most 20mins. Please hold me accountable. Thank you :)

2

0

30

2wl

@2wlearning

1 month

How to get me to follow you Step 1. anime pfp

5

1

29

2wl

@2wlearning

2 months

Layernorm bending polytope boundaries in 3D space. Same kinda vibe as those 'invert a sphere' topology problems.

2

0

29

2wl

@2wlearning

2 months

I realized something far too late. You have to choose authenticity over fitting in. Even if people cringe at you or feel disappointed by your true self. You can't hide behind being 'quiet' to avoid criticism of your real interests. You'll never find others sharing your values.

4

1

26

2wl

@2wlearning

2 months

Day 21 progress report I fully understand layernorm both algebraically and geometrically now. Managed to graph the 2d activation plane under bias transform with GeoGebra. Feels euphoric. I won. I got sidetracked theorymaxxing on residual superposition so GPT-2 is tomorrow now.

1

0

28

2wl

@2wlearning

2 months

this is so cool, how come i didnt know about this

GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of...

A simple but complete full-attention transformer with a set of promising experimental features from various papers - lucidrains/x-transformers

github.com

1

0

27

2wl

@2wlearning

2 months

Day 22 progress report Ok, I jinxed GPT-2 yesterday and proceeded to fall into another massive theory rabbithole. Today I read abt the polytope lens for the ReLU activation function, as well as NNs folding in n-dimensional space. I started reading on DNN tropical geometry (wip)

0

26

2wl

@2wlearning

1 month

When you flinch in the face of adversity, you decide your fate before the battle even begins. You have to crawl through the depths of hell without hesitating a second, throwing yourself through the fire over and over. It's the only way to get stronger.

2

27

2wl

@2wlearning

28 days

Day 44 progress report I did a bit of coding today, not as much as I would have liked but my poor sleep schedule has really caught up to me and I have to give it my all for the upcoming assignments + exams brutal combo...

1

25

2wl

@2wlearning

2 months

Day 9 7 hours of effort on surpassing the SOTA for LLM competitive math 🔥🔥🔥. Switched llama-cpp-python for using llama.cpp directly, still kinda bad but I have ideas (C++ arc soon!) My train got cancelled (someone jumped on the tracks) so no statistics today :(

2

0

25

2wl

@2wlearning

1 month

Day 41 progress report More vision transformer reading today. The win-win paper is pretty cool, idk why but i love stupid tricks like this.

1

25

2wl

@2wlearning

2 months

Day 26 progress report Chill day, was mostly resting but did a little PyTorch practice and read about norm growth in transformer residual streams (pretty interesting) Also spent time contemplating next steps: I'm going to start implementing papers in PyTorch & grinding quantity

5

0

23

2wl

@2wlearning

2 months

Even after years, I still think about this article almost every week.

Quantity Always Trumps Quality

Nathan Bowers pointed me to this five year old Cool Tools entry on the book Art & Fear. Although I am not at all ready to call software development "art" -- perhaps "craft" would be more appropriate,...

blog.codinghorror.com

3

0

24

2wl

@2wlearning

29 days

Day 43 progress report Spent today working on a large-scale LLM-assisted translation project. But man, cmd-r+ really is a beast of a translation powerhouse. An open source model trading blows with GPT-4o is just insane.

1

23

2wl

@2wlearning

1 month

Day 37 progress report A little more incremental progress today. I'll keep moving.

1

21

2wl

@2wlearning

1 month

Day 40 progress report Some light reading on vision transformers and sub-quadratic attention today. Took a peek at some of the unsloth code (triton) and some cutlass examples, I think I'll save it for later. So much work to multiply a matrix rip

0

22

2wl

@2wlearning

2 months

Day 19 progress report Decided to focus entirely on transformers today. Collated tons of information from different places then mentally pieced everything together and root caused all my misunderstandings for ~6hrs, imagining all the matmuls in my head. Diving even deeper tmrw

2

0

20

2wl

@2wlearning

2 months

when you realize we're in the beta world line

3

0

21

2wl

@2wlearning

1 month

youtube comments coming in clutch. way better explanation than the eleuther one

1

0

21

2wl

@2wlearning

2 months

It's quite disorienting feeling a momentary sense of clarity after learning how neural networks and transformers work, only to realize that your first-order understanding is merely a 2D slice of the hyperdimensional manifold of complex emergent behaviors.

1

0

20

2wl

@2wlearning

1 month

Listen up, I have a confession to make. This may come as a shock to some of you, and no doubt some of you may feel disappointed by this, but I'm not actually captain Murrue Ramius from Gundam Seed. I know, I know. I'm sorry to everyone feeling betrayed by this revelation.

4

0

21

2wl

@2wlearning

2 months

Day 18 progress report Great progress today. Skim read about 75% of the fastai book. Lots of interesting training tidbits. Never going to blunder time checking twitter in the morning again. Plan for tomorrow: Finish the book and get started on everyone's favorite TRANSFORMERS

1

0

21

2wl

@2wlearning

2 months

Day 10 Progress: Math/LLM noob grind Accidentally wasted 1 submission because I don't know how Kaggle works. Oops. Planning on trying exllama (batched inference :o) or modifying llama.cpp next. Learned regression tests (R^2 ftw) Also, seems at least 1 or 2 people noticed me haha

1

0

21

2wl

@2wlearning

1 month

Resources on divergence metrics if you're interested (quite applicable to LLMs)

Jensen–Shannon Divergence - Notes on AI

Not Found File Jensen–Shannon Divergence.md does not exist.

notesonai.com

1

0

19

2wl

@2wlearning

1 month

Day 42 progress report Read a bit about hyperparameter optimization today. Arguably the most important aspect of building high performance models nowadays (shockingly, perhaps even more than data depending on your perspective).

0

1

19

2wl

@2wlearning

2 months

1

0

20

2wl

@2wlearning

1 month

It's joever

0

19

2wl

@2wlearning

2 months

Day 27 progress report Learning about modern transformer stuff, so RoPE, KV-cache, RMSnorm (rescales the norm without any hyperdimensional shenanigans, boo). The block diagonal part of RoPE still confuses me. I want to try implementing llama or mistral in tinygrad tomorrow.

2

1

18

2wl

@2wlearning

2 months

Am I missing something? Following the formula, in the attention pattern the queries are the rows and the keys are the columns. This means the causal mask must be upper triangular. But if you watch 3b1b or Karpathy's transformer videos, the causal mask is lower triangular? wat

4

0

19

2wl

@2wlearning

2 months

I don't care if it takes me 10 hours or 10 years, I will systematically pull apart every seam of neural network behavior from every perspective until the entire system fits inside my head. I will not stop. The wall will crumble.

4

1

18

2wl

@2wlearning

1 month

I WILL NEVER SURRENDER. NEVER FORGET WHO YOU ARE. DEATH CANNOT KILL ME. WE STORM THE BEACHES AT THE CRACK OF DAWN.

0

18

2wl

@2wlearning

2 months

Day 20 progress report Learned more about layernorm today (understand the projection now, but not 100% tbh). Yes, a day on layernorm of all things. Be fast where others are slow and slow where others are fast. Also realizing I don't know what a linear layer matrix looks like 1/2

4

0

16

2wl

@2wlearning

2 months

That exact feeling was my breaking point in the past, but I refuse to let things end here. I'm going to regroup my knowledge to solidify my understanding, clear out overdue irl backlog, grind more pytorch/einops, and start my counterattack with 110% focus. I WILL NEVER SURRENDER

2

0

16

2wl

@2wlearning

2 months

1

0

16

2wl

@2wlearning

1 month

believe in the future waiting for me. It doesn't matter how many times I have to pull myself together and try again, every day, slowly, meticulously, I will inch closer to the world that calls me. I know my gradients. See you downhill :)

2

0

15

2wl

@2wlearning

2 months

Hop in loser, we're going from exponential to factorial scaling laws.

1

0

16

2wl

@2wlearning

2 months

Insane alpha in pandemic-era lecture recordings with a few hundred views on YouTube.

Lecture 4: Signal Propagation and Dynamical Isometry in Deep Neural...

null

www.youtube.com

0

1

15

2wl

@2wlearning

1 month

Cool stuff (ian goodfellow's relu comment was very interesting)

Ch 9: Convolutional Networks

Ian Goodfellow lectures about Batch Normalization and Convolutional Networks.Link to "Chapter 8: Optimization for Training Deep Models":http://www.deeplearni...

www.youtube.com

1

2

14

2wl

@2wlearning

2 months

My entire understanding of 'average' was a lie. The concept of 'mean' is really just curve fitting in disguise. Arithmetic mean fits sorted values to a line, geometric mean fits to an exponential curve, harmonic mean fits to a hyperbolic curve. You can use any differentiable func

4

0

13

2wl

@2wlearning

2 months

Can someone with a more mathematical background pls explain this paper to me: . I started losing the plot at tropical hypersurfaces. What did they mean by this?

1

0

14

2wl

@2wlearning

2 months

Beware of absolute statements in ML. Even things as innocuous as "batchnorm prevents internal covariate shift" or "resnet fixes vanishing gradients". Zoom in close enough and there's a billion asterisks, and at the end of it, it's questionable if it's even true. Trust nobody.

1

0

13

2wl

@2wlearning

2 months

1

0

13

2wl

@2wlearning

1 month

Also I'll stick with just PyTorch for now. Tinygrad is cool but not the thing I need at the moment. Cool resources:

0

14

2wl

@2wlearning

2 months

Day 27 bonus update (epistemological edition!) I've been working towards developing an intuitive understanding of modern DL as a nearish-term goal. I wrote everything so far into notes and looked for weak areas: vector calc/loss landscapes, information theory, SVD/low rank stuff

2

0

13

2wl

@2wlearning

1 month

Step 2. daily learning updates (guaranteed follow tbh)

0

13

2wl

@2wlearning

1 month

my poor crappy laptop is fighting for its life to train mnist

3

1

11

2wl

@2wlearning

2 months

Forgot to mention yesterday, I really like this resource for visualizing all the matrix-vector multiplies in self-attention. Seeing the transforms visually is so different from just understanding matmul conceptually

0

12

2wl

@2wlearning

1 month

Been struggling a lot with negative self-talk that my coding speed is always going to be slow and bottleneck me. Bug after bug kills momentum. I know what I have to do, it's just painful. When practice doesn't pay off, you have to pour blood and sweat into it. Without hesitation.

2

0

12

2wl

@2wlearning

2 months

@nachoyawn many such cases

1

0

11

2wl

@2wlearning

1 month

You can't change the world without getting your hands dirty.

0

1

10

2wl

@2wlearning

2 months

Cool how you can separate out prediction uncertainty as either being caused by variance in the training data for the problem vs the problem not existing in the training data at all. Paper name: Single-model Uncertainties for Deep Learning

1

0

12

2wl

@2wlearning

2 months

Day 12 Progress: lvl1 frequentist noob vs lvl99 bayesian boss Tried to do more PCA today with numpy but couldn't focus. Started reading intro to Bayesian statistics. Stupid rat brain has to stop doomscrolling social media and start grinding ML textbooks.

0

11

2wl

@2wlearning

1 month

uhhh i think the tinygrad mnist tutorial/clang backend is bugged

1

0

11

2wl

@2wlearning

2 months

so easy to fall into the trap of complaining abt how much the world sucks. the only thing i have to consider is the agency i have over the things i control. it's the only way to change things to be better.

0

11

2wl

@2wlearning

2 months

Okay, as promised yesterday, I will post my (afaik novel) theory on the connection between branch specialization, neuron superposition and the vanishing gradients problem. @Final_Industry DMed me to explain it earlier, so I'll copy what I said. This thread might be long.........

1

0

11

2wl

@2wlearning

2 months

it's genuinely concerning that so many people can use LLMs and not process the full implications of what ICL means (and I know because I used to be one of those people)

Jade

@Euclaise_

2 months

This surprises me, I didn't expect this to work

7

19

189

2

0

11

2wl

@2wlearning

2 months

2014 was 10 years ago. 10 years

0

10

2wl

@2wlearning

1 month

I was being silly yesterday, actually I don't have to think about it so hard. Dive into theory to find ideas and directions for research, then experiment like crazy until all the ideas are exhausted and I've built some cool stuff, then repeat.

0

9