Recently, I have made research on descriptors for RGB-D images, but I'm facing problems on how to compute the recall-precision now.
My direct thoughts are, first, after detecting and describing the keypoints, converting the two depth images into point clouds respectively. Second, using ground truth data (rotation matrices and vectors) to transform the two point clouds to a world coordinate system. Third, just computing recall-precision as what will do on a single point cloud.
If my thoughts are right, then there are some questions in my experiment: a keypoint in RGB image may have value zero in the corresponding depth image, therefore, an actual matched keypoint in the RGB image may not be matched in the transformed point cloud.
How could I compute the recall-precision correctly?