here is the final version of my vehicle speed estimation demo
read the thread below to learn how I built it.
I will cover:
- detection
- tracking
- perspective transformation
- speed calculation
- some bonus ideas
↓
REAL-TIME object detection WITHOUT TRAINING
YOLO-World is a new SOTA open-vocabulary object detector that outperforms previous models in terms of both accuracy and speed. 35.4 AP with 52.0 FPS on V100.
↓ read more
supervision, the open-source library I created a year ago, has crossed 10,000 stars on GitHub this weekend!
thank you to everyone who helped me build this project!
it took us 2,000+ commits, 500+ PRs and 50+ contributors to do it.
repository:
almost fully functional version of my football AI project
today, I added player tracking using ByteTrack and projection of players onto the map
code coming soon:
I'm starting to get more and more serious with YOLO-World; trying to solve real-life problems.
I wanted to see if YOLO-World could recognize that the holes had been filled out.
It was pretty tricky, but I learned a little about prompting.
↓ read more
The traffic analysis project is growing! The YouTube tutorial will be out this week.
Progress: I can now identify that the car is in a specified zone.
Next: Match entrance and exit zones for every tracker ID to analyze the traffic flow.
GitHub repo:
I'm taking my football/soccer project to the next level
today, I worked on detecting players, referees, and the ball and mapping their positions from video frames to positions on the field.
↓ read more
I fine-tuned my first vision-language model
PaliGemma is an open-source VLM released by
@GoogleAI
last week. I fine-tuned it to detect bone fractures in X-ray images.
thanks to
@mervenoyann
and
@__kolesnikov__
for all the help!
↓ read more
manual data labeling is (almost) dead
1,500,000 images auto-annotated within 2 weeks of release.
now, we also support automatic segmentation labeling.
↓ read more about open-source models that power this feature
YOLOv9
Learning What You Want to Learn Using Programmable Gradient Information
Today's deep learning methods focus on how to design the most appropriate objective functions so that the prediction results of the model can be closest to the ground truth. Meanwhile, an appropriate
I need to take a break from football AI for a while.
I plan to experiment with PaliGamma, Google's new open-source VLM, over the next few days.
but don't worry, I'll be back. In the meantime, the football AI code is slowly making its way to this repo.
train YOLOv9 on your dataset tutorial
- run inference with a pre-trained COCO model
- fine-tune model on custom dataset
- evaluate the trained model
- run inference with a fine-tuned model
blogpost:
↓ read more
taking my football/soccer AI to the next level
- image embeddings
- dimension reduction
- player clustering
- awesome visualizations
code: (code migration in progress...)
↓ read more
looking for OpenAI-4V alternatives?
- LLaVA
- BakLLaVA
- CogVLM
- Fuyu-8B
- Qwen-VL
I am working on a short blog post discussing some GPT-4V alternatives. It will probably come out today.
links all resources:
Automated
@NBA
match commentary using
@OpenAI
vision and TTS (with code!)
Everyone is bragging about projects that generate automatic video commentary, but no one is showing the code. I did it while waiting for the plane.
code:
manual data labeling is almost dead
define prompts, tweak the confidence threshold, and make manual adjustments if necessary.
this feature is now available to all users, even on free accounts.
read more:
how to calculate the TIME objects spend IN THE ZONE? - that's the topic of my next tutorial.
here's a short (and a bit creepy) demo I built a few months ago.
do you have ideas for a less creepy use case for this tech?
github repository:
analyzing store traffic to find the most frequently visited areas
super demo created by
@Hine__Po
- member of Supervision community
link to repo if you want to build something over the weekend:
The YOLO-World YouTube tutorial is out!
please, let us know what you think!
- model architecture
- processing images and video in Colab
- prompt engineering and detection refinement
- pros and cons of the model
watch here:
↓ more resources
YOLOv9 tutorial: train model on custom dataset
- running inference with pre-trained COCO weights
- fine-tuning the model on a custom dataset
- model evaluation
- model deployment
sorry it took me so long; hope you like it
it took us a while, but the supervision-0.20.0 release will finally add support for key points.
what are your thoughts on annotators? so far, we only have EdgeAnnotator and VertexAnnotator.
supervision repo:
supervision-0.15.0 is out! This time, we bring highly customizable annotators.
We added eight annotators - box, mask, ellipse, label, circle, corner, trace, and blur. But the best part is... you can freely mix them!
GitHub repository:
improving object counting logic
today I solved an interesting bug that has existed in my library for a loooooong time
repository:
↓ WARNING: lots of math in the thread below
Easily one of the most exciting projects built with Supervision!
Our community member Vriza Wahyu Saputra built this fantastic ball juggling counting demo using the moving LineZone available in our API.
parking occupancy analysis
calculation of percentage occupancy in individual parking zones
all this was done with supervision:
btw,
@UenoLeo
is cooking a blog post covering this project, so stay tuned!
↓ read more
support for pose estimation and key point detection soon in the supervision
you can expect connectors for the most popular models and the first annotators in the next supervision release
can't wait to build demos like this with supervision
smart self-service checkout powered by YOLOv9
the value of the basket is updated live based on its changing content; what else should I add?
demo build with supervision:
I love watching other people build cool demos with the supervision library; traffic analysis examples built by Anant Jaiswal
- object tracking
- zone counting
- heat-map analysis
link:
What papers should I read to expand my knowledge of Transformers?
Please send links in the comments and write why this paper is worth reading. Thanks for your help!
Qwen-VL-Plus is SACARY good! (better than GPT-4V)
here it is casually solving Recaptcha!
- You don't have to give any additional instructions other than 'Solve it.'
- It can even mark the exact position of the objects it is looking for.
↓ it can do so much more
speed estimation tutorial is finally out!
- object detection
-multi-object tracking
- filtering detections with polygon zone
- perspective transformation and speed estimation
link:
below are some interesting visualizations I created for this video
↓
Sports Analytics with GPT-4 Vision
I wondered whether GPT-4V had the capability to automatically separate players into teams based on the color of their uniforms.
It took me a ridiculously long time to create this image, but in the meantime, I learned a lot about GPT-4V.
new YouTube tutorial: compute dwell time using computer vision in live streams
(seems easy, yet tricky)
- static file vs stream processing
- preventing growing latency and frame buffer overflow
- efficient stream processing
full tutorial:
↓ read more
- Object detection over HTTP?
- Easy!
We just open-sourced our inference server under Apache 2.0
Left terminal:
@roboflow
inference
Right terminal: video client
It took me ONE HOUR to craft this demo using supervision-0.18.0
- Three new annotators: PercentageBar, RoundedBox, and OrientedBox
- Enhanced LineZone feature for improved counting
- OBB (oriented bounding boxes) integration
↓ read more
repo:
YOLO-World + EfficientSAM + StableDiffusion for language-guided inpainting
I was inspired yesterday by the work of
@MrDravcan
(see attached), and I decided to try to replicate it.
SPOILER ALERT: it didn't quite work out for me.
↓ read more
time-in-zone (dwell time) tutorial is coming
this is the third time I'm trying to make this video; hopefully, the last one
I finally have a good use case - waiting time for service.
here is the first iteration. what do you think?
link:
awesome example of using Supervision for the detection, annotation, and counting of coffee seedlings
kudos to community member Eric Kimwatan
supervision repo:
↓ youtube tutorial and colab
always triple-check the correctness of your datasets and data augmentations.
today, I found two separate errors that ruined my model training.
but finally, we are on the right track
↓ here's where I messed up
supervision-0.18.0 is almost here!
we had planned to release it tomorrow, but we're still putting the finishing touches on the OBB (oriented bounding box) support
repository:
Manually annotate ONE image and let GPT-4V annotate ALL of them.
1. generate boxes for all images with GroundingDINO
2. provide categories for the reference image
3. prompt GPT-4V to map generated boxes to reference categories
detecting small objects is hard
I spent some time today writing a short how-to guide on using supervision (in combination with the most popular CV libraries) to detect small objects.
btw is that a good idea for a video tutorial?
link:
↓ read more
I'm experimenting with PaliGemma tonight
a single open-source model allowing you to:
- detect car (detection)
- answer questions about its color and brand (VQA)
- read license plate number (OCR)
all that on a single consumer-grade GPU
is there any other model that can do it?
I'm experimenting with a new annotator that zooms in on small detections
do you think it is something useful? or am I just wasting my time here?
more cool annotators:
processing documents with Claude 3
- Good OCR capabilities
- Process up to 20 images with a single API call
- API seems slow and a bit unstable; expect a lot of variance in call execution time
- ~2x cheaper than GPT4-V (please check my math)
↓ read more
time analysis with computer vision
- blurring faces
- detection and tracking
- smoothing detections
- filtering detections by zone
- calculating time
let me know if you want me to explain anything else. ;)
code:
↓ read more
finally had a little bit of time to work on my upcoming vehicle speed estimation tutorial
any improvement ideas?
the demo was built with the supervision
code will soon land on GitHub:
Is that demo too creepy?
Ignore that one lady sitting in the zona since the beginning is undetected. I am still trying to figure out why...
But zone timers work!
GitHub repository:
🔴 stream: YOLO-World Q&A + coding
in less than 15 minutes, I start my first YT stream; I'll be talking about YOLO-World and answering your questions that you left under my last YT video
stop by to say hello
link:
↓ some of the topics we will cover
Two months ago, I created a
@github
repository where I gathered links to the best free AI courses. 🔥
I started with five links, and now there are almost 20. 🚀 The entire repository already has 1200+ ⭐
⮑ 🔗 GitHub repository:
↓🧵some of the courses
supervision, the open-source library I created a year ago, has crossed 10,000 stars on GitHub this weekend!
thank you to everyone who helped me build this project!
it took us 2,000+ commits, 500+ PRs and 50+ contributors to do it.
repository:
using GPT-4V to split players into teams
blending detections with the same tracker ID allows you to significantly reduce the number of GPT-4V API calls when you process video
1 call / 25 frames
kudos to
@ikuma_uchida18
for coming up with this strategy
read more, it's cool ↓
Sports Analytics with GPT-4 Vision
I wondered whether GPT-4V had the capability to automatically separate players into teams based on the color of their uniforms.
It took me a ridiculously long time to create this image, but in the meantime, I learned a lot about GPT-4V.
The second day of work on my SAM + MetaCLIP + ProPainter HF Space
- Automated object masking [done]
- Automated inpainting using ProPainter [in progress]
I just added the polygon annotator to the supervision package
you can now use masks or polygons to visualize the result of the instance segmentation model
polygon annotator will be available in supervision-0.17.0
code:
processing this one-second video exhausted my entire daily quota of 500 GPT-4V requests
but if you were wondering,
@OpenAI
GPT-4V can automatically divide players into teams based on the color of their uniforms
Sports Analytics with GPT-4 Vision
I wondered whether GPT-4V had the capability to automatically separate players into teams based on the color of their uniforms.
It took me a ridiculously long time to create this image, but in the meantime, I learned a lot about GPT-4V.
estimating traffic density based on the live feed from NYC street cameras.
you can find out in real-time which streets are congested.
shoutout to
@UenoLeo
for creating this cool project!
CLIP by
@OpenAI
was revolutionary, but its data curation pipeline was never detailed nor open-sourced.
@Meta
has now released MetaCLIP, a fully open-source replication.
Models are on the hub:
YOLO (unofficial and incomplete) history
who made what?
while I wait for my first YOLOv9 model custom dataset fine-tuning to finish, I decided to share with you an incomplete YOLO history
with links to papers and code
YOLO (2016) Joseph Redmon et al.
- paper:
supervision-0.15.0 will be out tomorrow! This time we bring highly customizable annotators. Just plug in your model and we'll take care of the rest.
GitHub repository:
zone analysis is awesome;
you can use it to calculate an object's precise position in space, determine its movement path, or measure its distance traveled.
air traffic monitoring demo by
@carlos_melo_py
supervision repo:
↓ youtube tutorial and code
whenever I show zone analysis in my tutorials, people ask me how I designed the polygons
I decided to spend a few hours and create for you a small tool you can fire up locally to draw zones
code:
CS25: Transformers United V3 by
@Stanford
Stanford has recently updated its free course on Transformers, adding a fresh set of lectures.
Among the new content is a lecture by
@DrJimFan
demonstrating how agents based on GPT-4 can be used to play Minecraft.
counting people in zone (with code)
some time ago, I showed you how to use polygon zones to count people. I just added a refreshed version of this project to supervision/examples.
the new version offers a fast change of zone configuration.
code:
this new DiffMOT tracker looks pretty good.
I'd love to test it in one of my demos.
have any of you managed to get it working on your own video? (if so, let me know)
supervision-0.17.0 is out!
- added PixelateAnnotator, TriangleAnnotator, and PolygonAnnotator
- made MaskAnnotator 5x faster
- added integration with
@OpenAI
CLIP and
@huggingface
Timm
github:
time in zone tutorial is coming!
btw, would you like to watch me build my computer vision demos on Twitch?
zone timer will be released with supervision-0.18.0 this week: