Jino Rohit @jino_rohit X Profile

Jino Rohit

@jino_rohit

Followers

3K

Following

5K

Media

357

Statuses

2K

23 | ml and math |

https://t.co/LRuL9KUzJ0

Joined June 2023

Don't wanna be here? Send us removal request.

Jino Rohit

@jino_rohit

28 days

I wanted to re-introduce myself here on this platform - im a 23 year old MLE working at a startup. ive done a bit of everything - 1. kaggle competitions - i have 2 bronze medals and 1,499 was my peak rank globally. 2. have 5+ hackathon wins, both national and global events. 3.

85

45

1K

Jino Rohit

@jino_rohit

9 hours

unity docs has surprising well detailed documentation on arm NEON SIMD.

0

16

Jino Rohit

@jino_rohit

13 hours

for today - 1. ill be working on a general Tensor class in C++ that can handle most of the common LLM weights dtypes. 2. wanna take a closer look at templates in C++. 3. also need to work on a general model bind class to load different LLM architectures without harcoding layer

3

0

62

Jino Rohit

@jino_rohit

1 day

for today - 1. need to look at other cpp based frameworks to draw inspiration for a new inference engine. 2. got 2 PRs merged on camel-ai. 3. wrote a 50 line mnist neural net with handwritten backprop in deep-ml platform.

3

0

42

Jino Rohit

@jino_rohit

2 days

yaayy @real_deep_ml , i solved this in 50 LOC

Deep-ML

@real_deep_ml

6 days

The people have spoken, solve this question and get a chance to win a free Deep-ML T-shirt (or sneakers if you want) details in 🧵

1

42

Jino Rohit

@jino_rohit

3 days

I implemented absmax quantization for my C++ inference engine and it now generates 200 tokens/sec on CPU! I spend the last few days implementing symmetric quantization for my gpt architecture to perform operations in int8 and then dequantize to fp32. the perplexity is quite

5

4

61

Jino Rohit

@jino_rohit

3 days

learngitbranching - i often come back to this site to refresh concepts on git branching, and also theres the git-scm.

4

0

28

Jino Rohit

@jino_rohit

4 days

todays worklog - 1. raised 2 PRs on camel-ai, reviewed my previous PR for changes. 2. continue work on quant layers for the inference engine. 3. i need to start looking at reference implementations to make the inference engine more modular and clean.

1

0

36

Jino Rohit

@jino_rohit

4 days

repo -

github.com

Contribute to JINO-ROHIT/advanced_ml development by creating an account on GitHub.

0

9

Jino Rohit

@jino_rohit

4 days

i have this repo - advanced ml that hit 100 stars. but more or less its a personal collections of most of the things i pick up - loss functions, optimizers, regularizer, llm architectures, cuda, ml systems, python design systems etc. feel free to make use of it.

5

1

55

Jino Rohit

@jino_rohit

5 days

for today - 1. going to complete quant ops for the inference engine. 2. going to write layers to handle int8_t inputs and activations. 3. start dedicating a slice of time to understanding memory hierarchy and OS. 4. work on 1 PR to camel-ai.

1

54

Jino Rohit

@jino_rohit

5 days

the more i build stuffs, the more i realise i need to fill in my computer architecture and OS knowledge gaps

7

1

46

Jino Rohit

@jino_rohit

6 days

working on the quantization layer for inference engine - 1. wrote a TensorQuant class for handle int8_t types and the scaling factor for absmax. 2. quantized the fp32 model to int8 weights and memory mapped the offsets. 3. working on the layers to handle int8_t inputs,

2

1

69

Jino Rohit

@jino_rohit

6 days

i built this super fun AI PR reviewer using @Gradient_HQ parallax hosted Qwen3 0.6 B that is blazing fast on my macbook! this is a walkthrough video where i go through a little bit on parallax setup and usage of the PR reviewer. currently it can - 1. given a PR link on a public

3

2

33

Jino Rohit

@jino_rohit

7 days

working on quantization for my C++ inference engine. 1. implemented absmax quantization and converted weights from fp32 to int8. 2. now ill need to extend my tensor class to create, modify and view int8 tensors. 3. also implemented perplexity scores for the model prompt. gpt2

3

8

131

Jino Rohit

@jino_rohit

7 days

learn flash attention from umar jamil, rewatch it as many times as it takes and write it down

4

0

56

Jino Rohit

@jino_rohit

8 days

for today - 1. mostly few bug fixes on the inference engine. 2. learning flash attention. ive been thinking if flash attention "like" optimization would work for CPUs? i mean if i find an optimal size for the blocked matrix multiplication, will the operations happen on the

5

0

28

Jino Rohit

@jino_rohit

9 days

github -

github.com

a simple c++ inference engine for gpt based architecture - JINO-ROHIT/inferGPT

0

15

Jino Rohit

@jino_rohit

9 days

Im building my own C++ inference engine for LLMs that runs on CPU it currently supports - byte pair encoding for tokenization. - gpt2 architecture implemented with strided memory. - kv cache for speedup. - greedy sampling and temperature based sampling for tokens. - NEON

14

36

353

Jino Rohit

@jino_rohit

10 days

for today - 1. added NEON intrinsics for the layernorm layer for the inference engine. 2. figuring out conditional compilation depending on the hardware for the inference engine. 3. i need to rewrite my causal attention in a better way. 4. implemented and pushed a PR on camel-ai.

2

0

40

Jino Rohit

@jino_rohit

11 days

I replaced my dot products operations with ARM NEON SIMD intrinsics and now my C++ inference engine can generate 60 tokens/sec! this is ~3x speedup compared to my previous dot product loops. im looking at other layers to apply , also looking to implement flash attention when i

Jino Rohit

@jino_rohit

14 days

I built my first C++ inference engine for LLM - InferGPT. This is a video walkthrough of the engine(with audio) and currently it has - - BPE encoder and decoder. - gpt2 architecture implemented from scratch. - greedy sampling and temperature based sampling for tokens. -

2

93