mohit
@mohitwt_
Followers
868
Following
5K
Media
468
Statuses
3K
19, building custom dl framework from scratch
Joined June 2025
I'm building my own PyTorch from scratch, Implementing multiple core features like tensors, autograd, NN layers, optimizers, and more. The core engine will be written in C++/CUDA for performance, with a Pythonic, PyTorch-like API. starting C++ from today.
24
12
253
> completed chapter 4: very good read, how GPUs actually schedule and execute work, highlighting the architect pieces that determine kernel performance. it breaks down SM structure, warp schedulers, register and shared memory constraints, and more also covers more critical
> completed chapter 3 mapping threads to real 2D/3D data, images, grids, matrices. starts with indexing, then moves into blur kernels where each pixel averaging its neighbors to create that smooth blur effect and matrix multiplication reading this + working with pytorch next
0
2
13
> completed chapter 4: very good read, how GPUs actually schedule and execute work, highlighting the architect pieces that determine kernel performance. it breaks down SM structure, warp schedulers, register and shared memory constraints, and more also covers more critical
> completed chapter 3 mapping threads to real 2D/3D data, images, grids, matrices. starts with indexing, then moves into blur kernels where each pixel averaging its neighbors to create that smooth blur effect and matrix multiplication reading this + working with pytorch next
0
2
13
> completed chapter 3 mapping threads to real 2D/3D data, images, grids, matrices. starts with indexing, then moves into blur kernels where each pixel averaging its neighbors to create that smooth blur effect and matrix multiplication reading this + working with pytorch next
> started reading Programming Massively Parallel Processors: really solid read so far, understanding how GPUs actually work under the hood instead of just using them blindly 2 chapters done
2
2
18
from chapter 2: a quick look at how color images get converted to grayscale, what the input pixels look like, what the output becomes, and why its all perfectly parallel:
> started reading Programming Massively Parallel Processors: really solid read so far, understanding how GPUs actually work under the hood instead of just using them blindly 2 chapters done
0
0
12
> started reading Programming Massively Parallel Processors: really solid read so far, understanding how GPUs actually work under the hood instead of just using them blindly 2 chapters done
1
0
21
continue: > revised topics from calculus: - partial derv - grad vectors - chain rule - auto diff and more backprop math > prob topics: - prob distro - sampling - MC, MLE - bayes theorem and more > topics related to GPU, its fundamentals/optimization next
dev log: > revised a bunch of math topics: - eigenvalues and eigenvectors - SVD and low-rank approximation - L1/L2 norms - einsum, contraction ops > more topics from calculus/prob next
1
1
26
dev log: > revised a bunch of math topics: - eigenvalues and eigenvectors - SVD and low-rank approximation - L1/L2 norms - einsum, contraction ops > more topics from calculus/prob next
1
2
21
dev log: > more taichi internals, loop config, simt, static, root pointers, how kernels actually get parallelized > worked with numba fundamentals, njit, typed lists, contiguous arrays, warmup/cached runs, static signatures and more > working with a bit of maths next
1
0
9
dev log: > profiled python programs, jumped into Taichi > how kernel compile into their own LLVM/CUDA programs, ti.field, SNodes, types, init and alot more. > ran a 100 million iteration loop w/ taichi to see the speed difference
1
0
21
exams are finally over. back to posting daily, need to work alot on GenAI, python performance, gpu, papers, models, inference, and alot more. starting devlogs and multiple updates from tomorrow.
1
0
17
Yo . Just shipped TradeForge , a technical analysis platform that I've built with FastAPI. It pulls real-time stock data , calculates moving averages ( 5 days to even 200days) and gives you interactive charts . Check it out at : https://t.co/sHDkXTCNRl More info below
4
1
12
built a lightweight vector database + retrieval engine from scratch, handles embedding, similarity search, and relevance filtering without any external tools. also built a custom indexer from scratch, vector clustering + search, no FAISS, pure python. https://t.co/8RL2IxzE0F
github.com
A lightweight vector database, retrieval engine, and custom indexer, all built completely from scratch. - Mog9/VecEngine
7
4
45
lol
@bit_sparkle might be little biased since ive used both but react native alot more, flutter is actually amazing to pick up and great widgets system, and genuienly feels great to build with react native just offers way more in real world product work i would say if i have to choose one.
0
0
6
internals starting from tomorrow, practicals and external soon, will be less active for a while
0
0
8
GPU Bottlenecks Explained: every GPU has one mission, to process massive amounts of data as fast as possible. but no matter how powerful the hardware, something always ends up limiting that speed. that limiting factor is the bottleneck, the point where performance gets stuck.
3
0
36
tried @getalchemyst and its p good in context and retrieval, It remembers what I say in chat plus whatever I upload, had to try cuz im working on stuff related to this. it just picks up the conversation using all that context, good shit.
1
2
20