1

I am trying to use Pytorch and YOLOv5 to detect objects in multiple images and count them. My problem now is that if I have for example a frame rate of 15fps, the same objects can be recognized in the image, but they were only recognized for example a little bit in the front of the image (other coordinates) or the Objects have the same coordinates as before. Currently, only how many objects are detected in an image is counted. How can I exclude these objects or compare whether the objects have already been detected?

My Code so far:

counts = {"cars" : 0 , "trucks" : 0}
class_mappings = {2.0: "cars", 7.0: "trucks"}

def predict():
    img = Image.open("test.jpeg")
    result = model(img)
    labels = dict(Counter(result[:, -1].tolist()))
    for k, v in class_mappings.items():
        counts[v] += labels.get(k, 0)

This Code above is extracting the labels of detected objects from the Tensor and count them in a counter variable.

Test
  • 571
  • 13
  • 32
Arthi
  • 73
  • 8

1 Answers1

1

Most of the sorting algorithms/models will work out for you like a charm.

i.e. what you need is to track each box step by step after inferencing on each frame and assigning id/count to them based on some distance function to determine object's id after it has moved.

It's commonly referenced as MOT (Multiple Object Tracking). You can meet two versions of MOT: statistical approach & DL + statistical on top.

DL version is more useful if you're working on certain environments with lot of noise ofc. But it comes with downside - you've to run feature extractor in real-time.

Although, I've worked with statistical approaches (based on Kalman Filter) in a very demanding production setting and after some tweaking it worked unbelievably well on very dense MOT task.

You can try out: https://github.com/wmuron/motpy It'll be easy to integrate it to YoloV5.

deepconsc
  • 511
  • 4
  • 4
  • 1
    So if I have now a pretrained YOLOv5 model and I have now the Tensor with all the detected Objects, i should just give them an ID and use some Distance functions to track, after it has moved? – Arthi Jan 31 '22 at 11:49
  • Well, pretty much. If you want to count individual objects, you have to identify their index yet, otherwise you'll just be counting everything in every frame. The github repository I've linked above will help you do that. It'll return IDs of each object, and then you can count each via index. You just have to pass detection scores & class ids, and it'll return back the IDs (as I remember). If you have trouble with integrating it, let me know and I can drop an old-written gist to you. – deepconsc Jan 31 '22 at 12:03