0

I am new to computer vision and I have a simple question that could not get any answer for on the web. I am using mask rcnn implementation by Matterport to perform a binary classification on some images and I have some extra lines of code that compute the mAP for each image. Now I would like to know, if I can add up the mAPs calculated for each image and then divide the number to get mAP for the whole dataset, and if not, how can I compute the overall mAP? (preferrably using the utilities of the mask rcnn model)

Hessam
  • 1
  • 1
  • 2 steps - 1. For each image calculate the average precision across different recall threshold points - Mathematically, we say it as - Integral of the "Area under the precision recall curve" for each image. 2. Average of the above across total images i.e (sum of total precision) / (number of images) Would be more clear if you could share the output format as a sample. – Prachi Jul 27 '20 at 20:38
  • @Prachi Thanks for the response. That is much more convenient than what I had in mind, but I don't understand why it works. Could you please elaborate on that, or introduce me to some text book? Because I thought that precision and recall should be calculated for all instances across all images and then the area under the curve would yield the final mAP, right? – Hessam Jul 29 '20 at 09:14
  • What is instances here "instances across all images"? – Prachi Jul 29 '20 at 21:04
  • @Prachi by instances I mean any object that is detected. So for example under this article (https://medium.com/@jonathan_hui/map-mean-average-precision-for-object-detection-45c121a31173#:~:text=mAP%20(mean%20average%20precision)%20is,difference%20between%20AP%20and%20mAP.) imagine that we have the same number of objects, but divided into two images. Would the results be the same if I calculate mAP for each image and then average across all images, or add all detection results, rank them and then calculate mAP? – Hessam Jul 31 '20 at 08:38
  • Okay, So in object detection the results are reported at image level along with the corresponding detected Bounding Box. So,if an image has 5 Bboxes, it will have rows in the prediction dataset and precision-recall be calculated individually for each bbox. Explanation here could help https://stats.stackexchange.com/questions/260430/average-precision-in-object-detection – Prachi Jul 31 '20 at 13:57

1 Answers1

0

Yes, you can do something like

np.sum(recall)/num_test
np.sum(precision)/num_test

where num_test is number of test images

Just keep training and test data separate.

Abhi25t
  • 3,703
  • 3
  • 19
  • 32