michael_trbo Profile Banner
michael.trbo Profile
michael.trbo

@michael_trbo

Followers
53
Following
339
Media
7
Statuses
90

ece ai @westernu

Joined July 2020
Don't wanna be here? Send us removal request.
@satvikgari
Satvik Garimella
2 days
Just built Curserve with @nathanbarrydev @Alexkranias and @PranavTadepalli . It’s a server-side coding agent framework that’s 30x faster than spawning subprocesses and eliminates network latency entirely. We also placed in the Crater sponsor prize! Heres how we did it.
7
6
13
@michael_trbo
michael.trbo
12 days
the jump from xor to mnist wasn’t huge in concept, just in scale once you understand how an mlp actually learns, everything else in ml starts to make sense. next up: cnn
0
0
5
@michael_trbo
michael.trbo
12 days
so that's the next step, turning this mlp into a cnn (convolutional neural network) instead of flattening images, a cnn learns spatial features — edges, corners, shapes it’s the same idea again, just adapted to visual structure. that’s when accuracy jumps
1
0
4
@michael_trbo
michael.trbo
12 days
even with all that, my mlp still misclassifies some digits, especially similar ones like 4 and 9 that’s expected though, mlps treat every pixel as independent a cnn would fix this by learning spatial features like edges and shapes instead of just flat pixel patterns
1
0
4
@michael_trbo
michael.trbo
12 days
the interesting part is how similar mnist feels to xor — the structure and math are identical the only real difference is scale: more inputs, more weights, more data once you get the fundamentals, scaling up feels natural
1
0
4
@michael_trbo
michael.trbo
12 days
i also built a small tkinter gui where you can draw digits and watch the model classify them instantly it’s simple, but seeing your network recognize something you just drew makes all the theory finally click
1
0
4
@michael_trbo
michael.trbo
12 days
once the model was trained, i converted it to openvino IR format so it could run on my laptop’s intel npu this lets it perform real-time inference locally, no gpu or cloud needed, basically the same model, just optimized for hardware acceleration
1
0
4
@michael_trbo
michael.trbo
12 days
i wrote two versions of the training loop: https://t.co/Hd82lbjvrh → cpu-only, for systems without cuda train_optimized.py → runs on my rtx 4050 using gpu acceleration the optimized one trains way faster and also adds dropout + validation tracking to get better results
1
0
4
@michael_trbo
michael.trbo
12 days
the xor network forced me to implement all of that by hand in numpy — forward pass, loss calc, backprop, everything just doing that math once made pytorch much more understandable this time, i let pytorch handle the autograd and focused on architecture and efficiency
1
0
4
@michael_trbo
michael.trbo
12 days
underneath it all, ml is still just math each neuron does something close to y = mx + b, and the “learning” is just tweaking those weights using gradients backpropagation is just the chain rule in action — applied over thousands of these equations in parallel
1
0
4
@michael_trbo
michael.trbo
12 days
mnist is a dataset of 28x28 grayscale images of handwritten digits (0–9) each image is flattened into 784 inputs, passed through a few hidden layers, and outputs 10 neurons — one for each digit the model learns which patterns of pixels correspond to which number
1
0
4
@michael_trbo
michael.trbo
12 days
After building a small MLP from scratch that solved the XOR problem, I wanted to see if I could scale that same idea to something more practical — recognizing handwritten digits with the MNIST dataset. Here's how I did it.👇 https://t.co/NtiKfAjzuq
5
5
18
@sakshambatraa
saksham
17 days
finally finished multihead attention as part of the transformer in jax, comp graph was longer this time but had so much fun drawing, onto encoder next!!
3
3
17
@sakshambatraa
saksham
21 days
currently building a transformer from scratch in jax to understand the architecture, and how ML compilers work. finished the file that processes the embeddings, and implemented RoPE. also took a look at the JAXPR and StableHLO IR’s and drew a computational graph :)
10
4
48
@michael_trbo
michael.trbo
28 days
I started with random outputs and broken matmuls. Ended with a network that actually learned XOR Took way longer than expected, but I actually get it now Moral of the story: build stuff from scratch, the pain pays off 🫡
0
0
4
@michael_trbo
michael.trbo
28 days
And even though XOR is a toy problem, the same setup scales Add more layers, neurons, and compute. Suddenly you’re solving real problems like image recognition, speech, robotics The leap from XOR to modern AI is just size + training
1
0
4
@michael_trbo
michael.trbo
28 days
A few things I learned the hard way: track matrix shapes or you’ll suffer cross entropy is brutal if you’re confidently wrong backprop is literally just the chain rule, nothing mystical building this from scratch makes you appreciate PyTorch/TensorFlow so much more
1
0
4
@michael_trbo
michael.trbo
28 days
Once trained, the network gave the right answers: [0,0] = 0 [0,1] = 1 [1,0] = 1 [1,1] = 0 Loss dropped close to zero. After all the false starts, seeing it work felt unreal
1
0
4
@michael_trbo
michael.trbo
28 days
Eventually I slowed down and cleaned up the math: made sure weights had the right shapes (2x2 for hidden, 1x2 for output) used leakyReLU for all activations proper MSE loss backprop done step by step with the chain rule That’s when things finally started to click
1
0
5
@michael_trbo
michael.trbo
28 days
Of course, actually coding it wasn’t smooth I spent days fighting: shape mismatch errors misplaced biases bad activation choices outputs like [1,1] → 3.27 Most of ML debugging is just making sure your math dimensions line up
1
0
5