michael.trbo @michael_trbo X Profile

michael.trbo

@michael_trbo

Followers

53

Following

339

Media

7

Statuses

90

ece ai @westernu

https://t.co/zdEBjKITyI

Joined July 2020

Don't wanna be here? Send us removal request.

Satvik Garimella

@satvikgari

2 days

Just built Curserve with @nathanbarrydev @Alexkranias and @PranavTadepalli . It’s a server-side coding agent framework that’s 30x faster than spawning subprocesses and eliminates network latency entirely. We also placed in the Crater sponsor prize! Heres how we did it.

7

6

13

michael.trbo

@michael_trbo

12 days

the jump from xor to mnist wasn’t huge in concept, just in scale once you understand how an mlp actually learns, everything else in ml starts to make sense. next up: cnn

0

5

michael.trbo

@michael_trbo

12 days

so that's the next step, turning this mlp into a cnn (convolutional neural network) instead of flattening images, a cnn learns spatial features — edges, corners, shapes it’s the same idea again, just adapted to visual structure. that’s when accuracy jumps

1

0

4

michael.trbo

@michael_trbo

12 days

even with all that, my mlp still misclassifies some digits, especially similar ones like 4 and 9 that’s expected though, mlps treat every pixel as independent a cnn would fix this by learning spatial features like edges and shapes instead of just flat pixel patterns

1

0

4

michael.trbo

@michael_trbo

12 days

the interesting part is how similar mnist feels to xor — the structure and math are identical the only real difference is scale: more inputs, more weights, more data once you get the fundamentals, scaling up feels natural

1

0

4

michael.trbo

@michael_trbo

12 days

i also built a small tkinter gui where you can draw digits and watch the model classify them instantly it’s simple, but seeing your network recognize something you just drew makes all the theory finally click

1

0

4

michael.trbo

@michael_trbo

12 days

once the model was trained, i converted it to openvino IR format so it could run on my laptop’s intel npu this lets it perform real-time inference locally, no gpu or cloud needed, basically the same model, just optimized for hardware acceleration

1

0

4

michael.trbo

@michael_trbo

12 days

i wrote two versions of the training loop: https://t.co/Hd82lbjvrh → cpu-only, for systems without cuda train_optimized.py → runs on my rtx 4050 using gpu acceleration the optimized one trains way faster and also adds dropout + validation tracking to get better results

1

0

4

michael.trbo

@michael_trbo

12 days

the xor network forced me to implement all of that by hand in numpy — forward pass, loss calc, backprop, everything just doing that math once made pytorch much more understandable this time, i let pytorch handle the autograd and focused on architecture and efficiency

1

0

4

michael.trbo

@michael_trbo

12 days

underneath it all, ml is still just math each neuron does something close to y = mx + b, and the “learning” is just tweaking those weights using gradients backpropagation is just the chain rule in action — applied over thousands of these equations in parallel

1

0

4

michael.trbo

@michael_trbo

12 days

mnist is a dataset of 28x28 grayscale images of handwritten digits (0–9) each image is flattened into 784 inputs, passed through a few hidden layers, and outputs 10 neurons — one for each digit the model learns which patterns of pixels correspond to which number

1

0

4

michael.trbo

@michael_trbo

12 days

After building a small MLP from scratch that solved the XOR problem, I wanted to see if I could scale that same idea to something more practical — recognizing handwritten digits with the MNIST dataset. Here's how I did it.👇 https://t.co/NtiKfAjzuq

5

18

saksham

@sakshambatraa

17 days

finally finished multihead attention as part of the transformer in jax, comp graph was longer this time but had so much fun drawing, onto encoder next!!

3

17

saksham

@sakshambatraa

21 days

currently building a transformer from scratch in jax to understand the architecture, and how ML compilers work. finished the file that processes the embeddings, and implemented RoPE. also took a look at the JAXPR and StableHLO IR’s and drew a computational graph :)

10

4

48

michael.trbo

@michael_trbo

28 days

I started with random outputs and broken matmuls. Ended with a network that actually learned XOR Took way longer than expected, but I actually get it now Moral of the story: build stuff from scratch, the pain pays off 🫡

0

4

michael.trbo

@michael_trbo

28 days

And even though XOR is a toy problem, the same setup scales Add more layers, neurons, and compute. Suddenly you’re solving real problems like image recognition, speech, robotics The leap from XOR to modern AI is just size + training

1

0

4

michael.trbo

@michael_trbo

28 days

A few things I learned the hard way: track matrix shapes or you’ll suffer cross entropy is brutal if you’re confidently wrong backprop is literally just the chain rule, nothing mystical building this from scratch makes you appreciate PyTorch/TensorFlow so much more

1

0

4

michael.trbo

@michael_trbo

28 days

Once trained, the network gave the right answers: [0,0] = 0 [0,1] = 1 [1,0] = 1 [1,1] = 0 Loss dropped close to zero. After all the false starts, seeing it work felt unreal

1

0

4

michael.trbo

@michael_trbo

28 days

Eventually I slowed down and cleaned up the math: made sure weights had the right shapes (2x2 for hidden, 1x2 for output) used leakyReLU for all activations proper MSE loss backprop done step by step with the chain rule That’s when things finally started to click

1

0

5

michael.trbo

@michael_trbo

28 days

Of course, actually coding it wasn’t smooth I spent days fighting: shape mismatch errors misplaced biases bad activation choices outputs like [1,1] → 3.27 Most of ML debugging is just making sure your math dimensions line up

1

0

5