Jino Rohit @jino_rohit tweet - I'm building my own OpenCV from scratch - fastcv. fastcv is a C++ CUDA rewrite with Pytorch bindings of the image filters in the OpenCV library. I have already written two optimized kernels and will keep studying and implementing more. I have also added current benchmarks. https://t.co/UpRfCQTs5w

Jino Rohit

@jino_rohit

7 days

I'm building my own OpenCV from scratch - fastcv. fastcv is a C++ CUDA rewrite with Pytorch bindings of the image filters in the OpenCV library. I have already written two optimized kernels and will keep studying and implementing more. I have also added current benchmarks.

991

Replies

Hollow Byte

@stacktrackguy

7 days

@jino_rohit Curious if you’re planning to explore fusing ops (e.g. blur + grayscale in one kernel), could push performance even further by minimizing memory transfers

Jino Rohit

@jino_rohit

6 days

@stacktrackguy yeap!

Dhruv Malik

@lesDecroissant

6 days

@jino_rohit kudos indeed . you are indeed gonna be on the great learning curve. just curious whether you will be further digging deep dive into the nvidia PTX/SAAS level optimisation and other framework to optimise the ml kernel design ?

Jino Rohit

@jino_rohit

6 days

@lesDecroissant this was mainly a playground to test what ive learned so far, i want to move into llm inference optimization engines after finishing pmpp book!

Victor Asuquo | AI Engineer

@Victor_ML_AI

7 days

@jino_rohit The kind of projects I love to see

Jino Rohit

@jino_rohit

7 days

@Victor_ML_AI 🫶

💫ℹ️🐚▪️🌐💺🗨️🚨®️

@1__________l1l_

7 days

@jino_rohit This attempted project is great.

Jino Rohit

@jino_rohit

6 days

@1__________l1l_ thanks!

ƬⲘ ⚔️

@tm23twt

7 days

@jino_rohit demn, crazy stuff bro :)

Jino Rohit

@jino_rohit

7 days

@tm23twt thanks my man🫶

Aarno

@TheGlobalMinima

5 days

@jino_rohit Much much needed 🙏🏻 opencv has become a slop. About time. More power to you !!

Jino Rohit

@jino_rohit

5 days

@TheGlobalMinima haha no way, opencv is awesome! this is more like a way to consolidate my learning

a.desi.penguin

@_notapenguin

7 days

@jino_rohit cool idea, mind if I try doing something similar?

Jino Rohit

@jino_rohit

6 days

@_notapenguin of course, go for it!

sankit

@sankitdev

4 days

@jino_rohit Noice.. your project have piqued my interest. Waiting for more updates

Jino Rohit

@jino_rohit

4 days

@sankitdev thanks man!

Debopam Chowdhury

@DebopamChowdhu1

5 days

@jino_rohit nice

Jino Rohit

@jino_rohit

5 days

@DebopamChowdhu1 thanks!

Siddharth

@Pseudo_Sid26

5 days

@jino_rohit Dayum, really nice

Jino Rohit

@jino_rohit

5 days

@Pseudo_Sid26 thanks man!

akkiisfrommars

@akkiisfrommars

6 days

@jino_rohit That's super cool! all the best :)

Jino Rohit

@jino_rohit

6 days

@akkiisfrommars thanks man!

Lazy_neuron

@lazy_Neuron

5 days

@jino_rohit waiting for it

Jino Rohit

@jino_rohit

5 days

@lazy_Neuron yes chief 🫡

Kartik

@code_kartik

4 days

@jino_rohit great work

Jino Rohit

@jino_rohit

4 days

@code_kartik thanks kartik!

Gabriel L. Kannenberg

@gabriellkann

4 days

@jino_rohit That's super cool! If you are optimizing it with shared memory tricks, make sure you account for bank conflicts. It makes a significant difference. A two pass blur can be greatly optimized by doing two horizontal blur passes with transpose in between.

Jino Rohit

@jino_rohit

4 days

@gabriellkann i see, thanks!

Kosi.py

@Kosiengine

5 days

@jino_rohit super cool jino. mind share what you use in study?

Jino Rohit

@jino_rohit

5 days

@Kosiengine thanks man, of course! for cuda its completely been pmpp edition4 book, i occasionally watch this for visual understanding -

gagan

@gagan_builds

6 days

@jino_rohit cool shii

Jino Rohit

@jino_rohit

6 days

@gagan_builds thanks my man

Recon

@Preethi_747

6 days

@jino_rohit Is this open source ? Would love to contribute.

Jino Rohit

@jino_rohit

6 days

@Preethi_747 yeap, right here -