OpenCV: counting bounding box in a video

Question

I have some videos that are to be considered as ground truths for people detection: this is an example.

I also have the staple video (without any detections) and I have to run my people detector algorithm on it and compare my results with the ground truth video.

The problem is that I would like to have not only a qualitative comparison, but also a quantitative. So, as far as I am able to count the number of detections in my personal algorithm, I must find a reliable way to count the number of bounding boxes that appear in the ground truth video for each frame.

I have taken into account this link and this one either, but they are meant to find the contours of a shape, not a bounding box. I know it could sound non-sense to detect the number of detections, but this is the only way I have to get a numerical ground truth.

Are you sure there is no data attached to video with frame number and list of bounding boxes (I would expect that something like this have to exists)? Have you looked at https://bitbucket.org/amilan/motchallenge-devkit/ ? — wdudzik, Mar 15 '19 at 11:22
The validation videos from data sets have their ground-truths with them. Please look for one such file. It generally is `.xml` or `.csv`. — mibrahimy, Mar 15 '19 at 13:46
@wdudzik yes, you are right: ground truth files are attached [here](https://motchallenge.net/data/MOT16/#download) and I also found [how to use them](https://motchallenge.net/instructions/). I apologize for asking because I could have searched before, but I had been googling for good datasets for days and then I found this, it looked perfect for my code, but I blindly got lost in this ground truth trouble. As an excuse, I will provide my GitHub repo containing the detection as soon as I have something concrete. — Lorenzo, Mar 16 '19 at 18:17

Stephen Meschke · Accepted Answer · 2019-03-20T16:28:40.277

1

Use a pedestrian dataset that has source video and ground truth. The source video will be a video file (like .avi) and the ground truth is a spreadsheet (like .csv). The x,y coordinates and width and height of the bounding boxes around pedestrians is saved in the spreadsheet.

To visually check your results, draw the ground truth and your results on the same video.

Use an algorithm to quantitatively check your results. The accuracy function I used was:

overlap / ((ground_truth_area + my_results_area)/2)

The overlap is shown in gray in the gif. How I calculated overlap.

edited Mar 20 '19 at 16:28

answered Mar 15 '19 at 18:33

Stephen Meschke

2,820
1
13
25

I have been googling datasets for a couple of days; the example you linked does not fit my needs (i.e. I am not looking for a single pedestrian on crosswalk). However, I upvoted your answer as you pointed out a way to compute the accuracy, which was something I still had to figure out. – Lorenzo Mar 16 '19 at 17:08
Thanks. I originally wanted to create a dataset with several people, but I was unable to convince my friends to take part. Hope you find what your looking for. If not, making a pedestrian dataset is quite easy. I used some OpenCV tracking scripts I wrote to create the ground truth for the data set I linked to in my answer. – Stephen Meschke Mar 16 '19 at 22:13
Would you mind explaining (or providing a link to) how you got to that formula? In particular, how did you compute `overlap`? – Lorenzo Mar 20 '19 at 14:18
1

@Lorenzo I changed the .gif so that it's easier to see what the overlap is. I also edited the question and linked to another S.O. answer that will help you calculate the overlap of two rectangles. – Stephen Meschke Mar 20 '19 at 16:30

OpenCV: counting bounding box in a video

1 Answers1