jino_rohit Profile Banner
Jino Rohit Profile
Jino Rohit

@jino_rohit

Followers
3K
Following
5K
Media
357
Statuses
2K

23 | ml and math |

Joined June 2023
Don't wanna be here? Send us removal request.
@jino_rohit
Jino Rohit
28 days
I wanted to re-introduce myself here on this platform - im a 23 year old MLE working at a startup. ive done a bit of everything - 1. kaggle competitions - i have 2 bronze medals and 1,499 was my peak rank globally. 2. have 5+ hackathon wins, both national and global events. 3.
85
45
1K
@jino_rohit
Jino Rohit
9 hours
unity docs has surprising well detailed documentation on arm NEON SIMD.
0
0
16
@jino_rohit
Jino Rohit
13 hours
for today - 1. ill be working on a general Tensor class in C++ that can handle most of the common LLM weights dtypes. 2. wanna take a closer look at templates in C++. 3. also need to work on a general model bind class to load different LLM architectures without harcoding layer
3
0
62
@jino_rohit
Jino Rohit
1 day
for today - 1. need to look at other cpp based frameworks to draw inspiration for a new inference engine. 2. got 2 PRs merged on camel-ai. 3. wrote a 50 line mnist neural net with handwritten backprop in deep-ml platform.
3
0
42
@jino_rohit
Jino Rohit
2 days
yaayy @real_deep_ml , i solved this in 50 LOC
@real_deep_ml
Deep-ML
6 days
The people have spoken, solve this question and get a chance to win a free Deep-ML T-shirt (or sneakers if you want) details in 🧵
1
1
42
@jino_rohit
Jino Rohit
3 days
I implemented absmax quantization for my C++ inference engine and it now generates 200 tokens/sec on CPU! I spend the last few days implementing symmetric quantization for my gpt architecture to perform operations in int8 and then dequantize to fp32. the perplexity is quite
5
4
61
@jino_rohit
Jino Rohit
3 days
learngitbranching - i often come back to this site to refresh concepts on git branching, and also theres the git-scm.
4
0
28
@jino_rohit
Jino Rohit
4 days
todays worklog - 1. raised 2 PRs on camel-ai, reviewed my previous PR for changes. 2. continue work on quant layers for the inference engine. 3. i need to start looking at reference implementations to make the inference engine more modular and clean.
1
0
36
@jino_rohit
Jino Rohit
4 days
i have this repo - advanced ml that hit 100 stars. but more or less its a personal collections of most of the things i pick up - loss functions, optimizers, regularizer, llm architectures, cuda, ml systems, python design systems etc. feel free to make use of it.
5
1
55
@jino_rohit
Jino Rohit
5 days
for today - 1. going to complete quant ops for the inference engine. 2. going to write layers to handle int8_t inputs and activations. 3. start dedicating a slice of time to understanding memory hierarchy and OS. 4. work on 1 PR to camel-ai.
1
1
54
@jino_rohit
Jino Rohit
5 days
the more i build stuffs, the more i realise i need to fill in my computer architecture and OS knowledge gaps
7
1
46
@jino_rohit
Jino Rohit
6 days
working on the quantization layer for inference engine - 1. wrote a TensorQuant class for handle int8_t types and the scaling factor for absmax. 2. quantized the fp32 model to int8 weights and memory mapped the offsets. 3. working on the layers to handle int8_t inputs,
2
1
69
@jino_rohit
Jino Rohit
6 days
i built this super fun AI PR reviewer using @Gradient_HQ parallax hosted Qwen3 0.6 B that is blazing fast on my macbook! this is a walkthrough video where i go through a little bit on parallax setup and usage of the PR reviewer. currently it can - 1. given a PR link on a public
3
2
33
@jino_rohit
Jino Rohit
7 days
working on quantization for my C++ inference engine. 1. implemented absmax quantization and converted weights from fp32 to int8. 2. now ill need to extend my tensor class to create, modify and view int8 tensors. 3. also implemented perplexity scores for the model prompt. gpt2
3
8
131
@jino_rohit
Jino Rohit
7 days
learn flash attention from umar jamil, rewatch it as many times as it takes and write it down
4
0
56
@jino_rohit
Jino Rohit
8 days
for today - 1. mostly few bug fixes on the inference engine. 2. learning flash attention. ive been thinking if flash attention "like" optimization would work for CPUs? i mean if i find an optimal size for the blocked matrix multiplication, will the operations happen on the
5
0
28
@jino_rohit
Jino Rohit
9 days
Im building my own C++ inference engine for LLMs that runs on CPU it currently supports - byte pair encoding for tokenization. - gpt2 architecture implemented with strided memory. - kv cache for speedup. - greedy sampling and temperature based sampling for tokens. - NEON
14
36
353
@jino_rohit
Jino Rohit
10 days
for today - 1. added NEON intrinsics for the layernorm layer for the inference engine. 2. figuring out conditional compilation depending on the hardware for the inference engine. 3. i need to rewrite my causal attention in a better way. 4. implemented and pushed a PR on camel-ai.
2
0
40
@jino_rohit
Jino Rohit
11 days
I replaced my dot products operations with ARM NEON SIMD intrinsics and now my C++ inference engine can generate 60 tokens/sec! this is ~3x speedup compared to my previous dot product loops. im looking at other layers to apply , also looking to implement flash attention when i
@jino_rohit
Jino Rohit
14 days
I built my first C++ inference engine for LLM - InferGPT. This is a video walkthrough of the engine(with audio) and currently it has - - BPE encoder and decoder. - gpt2 architecture implemented from scratch. - greedy sampling and temperature based sampling for tokens. -
2
2
93