I am new to computer vision and I have a simple question that could not get any answer for on the web. I am using mask rcnn implementation by Matterport to perform a binary classification on some images and I have some extra lines of code that compute the mAP for each image. Now I would like to know, if I can add up the mAPs calculated for each image and then divide the number to get mAP for the whole dataset, and if not, how can I compute the overall mAP? (preferrably using the utilities of the mask rcnn model)
Asked
Active
Viewed 1,229 times
0
-
2 steps - 1. For each image calculate the average precision across different recall threshold points - Mathematically, we say it as - Integral of the "Area under the precision recall curve" for each image. 2. Average of the above across total images i.e (sum of total precision) / (number of images) Would be more clear if you could share the output format as a sample. – Prachi Jul 27 '20 at 20:38
-
@Prachi Thanks for the response. That is much more convenient than what I had in mind, but I don't understand why it works. Could you please elaborate on that, or introduce me to some text book? Because I thought that precision and recall should be calculated for all instances across all images and then the area under the curve would yield the final mAP, right? – Hessam Jul 29 '20 at 09:14
-
What is instances here "instances across all images"? – Prachi Jul 29 '20 at 21:04
-
@Prachi by instances I mean any object that is detected. So for example under this article (https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173#:~:text=mAP%20(mean%20average%20precision)%20is,difference%20between%20AP%20and%20mAP.) imagine that we have the same number of objects, but divided into two images. Would the results be the same if I calculate mAP for each image and then average across all images, or add all detection results, rank them and then calculate mAP? – Hessam Jul 31 '20 at 08:38
-
Okay, So in object detection the results are reported at image level along with the corresponding detected Bounding Box. So,if an image has 5 Bboxes, it will have rows in the prediction dataset and precision-recall be calculated individually for each bbox. Explanation here could help https://stats.stackexchange.com/questions/260430/average-precision-in-object-detection – Prachi Jul 31 '20 at 13:57
1 Answers
0
Yes, you can do something like
np.sum(recall)/num_test
np.sum(precision)/num_test
where num_test
is number of test images
Just keep training and test data separate.

Abhi25t
- 3,703
- 3
- 19
- 32