@jino_rohit
Jino Rohit
7 days
I'm building my own OpenCV from scratch - fastcv. fastcv is a C++ CUDA rewrite with Pytorch bindings of the image filters in the OpenCV library. I have already written two optimized kernels and will keep studying and implementing more. I have also added current benchmarks.
38
58
991

Replies

@stacktrackguy
Hollow Byte
7 days
@jino_rohit Curious if you’re planning to explore fusing ops (e.g. blur + grayscale in one kernel), could push performance even further by minimizing memory transfers
1
0
22
@jino_rohit
Jino Rohit
6 days
1
0
5
@lesDecroissant
Dhruv Malik
6 days
@jino_rohit kudos indeed . you are indeed gonna be on the great learning curve. just curious whether you will be further digging deep dive into the nvidia PTX/SAAS level optimisation and other framework to optimise the ml kernel design ?
2
0
7
@jino_rohit
Jino Rohit
6 days
@lesDecroissant this was mainly a playground to test what ive learned so far, i want to move into llm inference optimization engines after finishing pmpp book!
0
0
1
@Victor_ML_AI
Victor Asuquo | AI Engineer
7 days
@jino_rohit The kind of projects I love to see
1
0
6
@jino_rohit
Jino Rohit
7 days
0
0
1
@1__________l1l_
💫ℹ️🐚▪️🌐💺🗨️🚨®️
7 days
@jino_rohit This attempted project is great.
1
0
5
@jino_rohit
Jino Rohit
6 days
0
0
1
@tm23twt
ƬⲘ ⚔️
7 days
@jino_rohit demn, crazy stuff bro :)
1
0
4
@jino_rohit
Jino Rohit
7 days
@tm23twt thanks my man🫶
0
0
3
@TheGlobalMinima
Aarno
5 days
@jino_rohit Much much needed 🙏🏻 opencv has become a slop. About time. More power to you !!
1
0
3
@jino_rohit
Jino Rohit
5 days
@TheGlobalMinima haha no way, opencv is awesome! this is more like a way to consolidate my learning
1
0
0
@_notapenguin
a.desi.penguin
7 days
@jino_rohit cool idea, mind if I try doing something similar?
1
0
3
@jino_rohit
Jino Rohit
6 days
@_notapenguin of course, go for it!
0
0
1
@sankitdev
sankit
4 days
@jino_rohit Noice.. your project have piqued my interest. Waiting for more updates
1
0
2
@jino_rohit
Jino Rohit
4 days
@sankitdev thanks man!
0
0
1
@DebopamChowdhu1
Debopam Chowdhury
5 days
1
0
2
@jino_rohit
Jino Rohit
5 days
0
0
1
@Pseudo_Sid26
Siddharth
5 days
@jino_rohit Dayum, really nice
1
0
1
@jino_rohit
Jino Rohit
5 days
@Pseudo_Sid26 thanks man!
0
0
1
@akkiisfrommars
akkiisfrommars
6 days
@jino_rohit That's super cool! all the best :)
1
0
1
@jino_rohit
Jino Rohit
6 days
@akkiisfrommars thanks man!
0
0
1
@lazy_Neuron
Lazy_neuron
5 days
@jino_rohit waiting for it
1
0
1
@jino_rohit
Jino Rohit
5 days
@lazy_Neuron yes chief 🫡
0
0
1
@code_kartik
Kartik
4 days
@jino_rohit great work
1
0
1
@jino_rohit
Jino Rohit
4 days
@code_kartik thanks kartik!
0
0
0
@gabriellkann
Gabriel L. Kannenberg
4 days
@jino_rohit That's super cool! If you are optimizing it with shared memory tricks, make sure you account for bank conflicts. It makes a significant difference. A two pass blur can be greatly optimized by doing two horizontal blur passes with transpose in between.
1
0
1
@jino_rohit
Jino Rohit
4 days
@gabriellkann i see, thanks!
0
0
0
@Kosiengine
Kosi.py
5 days
@jino_rohit super cool jino. mind share what you use in study?
1
0
1
@jino_rohit
Jino Rohit
5 days
@Kosiengine thanks man, of course! for cuda its completely been pmpp edition4 book, i occasionally watch this for visual understanding -
1
0
2
@gagan_builds
gagan
6 days
@jino_rohit cool shii
1
0
1
@jino_rohit
Jino Rohit
6 days
@gagan_builds thanks my man
0
0
1
@Preethi_747
Recon
6 days
@jino_rohit Is this open source ? Would love to contribute.
1
0
1
@jino_rohit
Jino Rohit
6 days
@Preethi_747 yeap, right here -
0
0
2