I'm building my own OpenCV from scratch - fastcv. fastcv is a C++ CUDA rewrite with Pytorch bindings of the image filters in the OpenCV library. I have already written two optimized kernels and will keep studying and implementing more. I have also added current benchmarks.
38
58
991
Replies
@jino_rohit Curious if you’re planning to explore fusing ops (e.g. blur + grayscale in one kernel), could push performance even further by minimizing memory transfers
1
0
22
@jino_rohit kudos indeed . you are indeed gonna be on the great learning curve. just curious whether you will be further digging deep dive into the nvidia PTX/SAAS level optimisation and other framework to optimise the ml kernel design ?
2
0
7
@lesDecroissant this was mainly a playground to test what ive learned so far, i want to move into llm inference optimization engines after finishing pmpp book!
0
0
1
@TheGlobalMinima haha no way, opencv is awesome! this is more like a way to consolidate my learning
1
0
0
@jino_rohit That's super cool! If you are optimizing it with shared memory tricks, make sure you account for bank conflicts. It makes a significant difference. A two pass blur can be greatly optimized by doing two horizontal blur passes with transpose in between.
1
0
1
@Kosiengine thanks man, of course! for cuda its completely been pmpp edition4 book, i occasionally watch this for visual understanding -
1
0
2