Fixed Number of Keypoints for similar Image detection

Question

I have a set of Images of more then 1000 pictures. For every Image I extract SURF descriptors. Now I'll add a query Image and want to try to find the most similar image in the image set. For perfomance and memory reasons I just extract for every image 200 keypoint with descriptors. And this is more or less my problem. At the moment I filter the matches by doing this:

Symmetrie Matching: Simple BruteForce Matching in both directions. So from Image1 to Image2 and from Image2 to Image1. I just keep the matches which exist in both directions.

    List<Matches> match1 = BruteForceMatching.BFMatch(act.interestPoints, query.interestPoints);
    List<Matches> match2 = BruteForceMatching.BFMatch(query.interestPoints, act.interestPoints);

    List<Matches> finalMatch = FeatureMatchFilter.DoSymmetryTest(match1, match2);

    float distance = 0;
    for(int i = 0; i < finalMatch.size(); i++)
        distance += finalMatch.get(i).distance;

    act.pic.distance = distance * (float) query.interestPoints.size() / (float) finalMatch.size();

I know there are more filter methods. How you can see I try to weight the distances by the number of the final matches. But I don't have the feeling Iam doing this correct. When I look to other approaches it looks they all compute with all extractet interest points which exists in the image. Does anyone have a good approach for this? Or a good idea to weight the distances?

I know there is no golden solution, but some experiences, ideas and other approaches would be really helpfull.

score 0 · Answer 1 · answered Jun 26 '14 at 10:01

0

So "match1" represents the directed matches of one of the database images and "match2" a query image, "finalMatch" are all the matches between those images and "finalMatch.get(i).distance" is some kind of mean value between the two directed distances.

So what you do is, you calculate the mean of the sum of the distances and scale them by the number of interest points you have. The goal I assume is to have a good meassure of how well the overall images match.

I am pretty sure the distance you calculate doesn't reflect that similarity very well. Dividing the sum of the distances by the number of matches makes some sense and this might give you an idea of similarity when compared to other query images, but scaling this value with the number of interest points just doesn't do anything meaningful.

First of all I would suggest that you get rid of this scaling. I'm not sure what your brute force matching does exactly, but additionally to your symmetry test, you should discard matches where the ratio of the first and the second candidate is to high (if I remember right, Lowe suggest a threshold of 0.8). Then, if it is a rigid scene, I would suggest that you apply some kind of fundamental matrix estimation (8 point algorithm + RANSAC) and filter the result using epipolar geometry. I'm pretty sure the mean discriptor distance of the "real" matches will give you a good idea about the "similarity" of the database image and the query.

answered Jun 26 '14 at 10:01

DrPepperJo

632
1
5
19

There is no mean. It is the real euclidean distance of 2 Descriptors. The symmetrie test is to filter twice or more matched keypoints. I know the ratio test and the RANSAC test. The ratio test I don't want do at the moment because build a kdtree and search inside this on needs longer than BF matching. – PeterNL Jun 26 '14 at 12:35
kdtree search most definitely won't take longer than BF matching. Also, forget what I said about the mean, of course it is the same distance in both directions. But why do you scale with the number of interest points? – DrPepperJo Jun 26 '14 at 14:34
kdtree can be slower then BF matching. It dependes of the length of the descriptor vectors. By a length of 32 values mostly Brute Force Matching is faster. I tried it and it is true! I scale for the case if there is only one match after filtering! If this match has a really small distance the match is really well. But in reall it isn't only one match is really bad. So in this case the distance get multiplied by 200. – PeterNL Jun 27 '14 at 10:30
Ok, but as far as I understand, you multiply every mean distance that you calculate for the query by the same 200! Maybe you have to elaborate on what you are trying to do and extend the code a little bit. – DrPepperJo Jun 29 '14 at 11:43
There are all kinds of approaches for nearest neighbor searches and brute force is (as the name suggests) the most straightforward and therefore least efficient one. That doesn't mean, that a BF implementation can't be faster than a search version where the data is prearranged. For descriptor matching in high dimensions, most established approaches rely on an approximate nearest neighbor, where you apply a heuristic like limiting the search depth in the the tree, to be more efficient. – DrPepperJo Jun 29 '14 at 11:49

Fixed Number of Keypoints for similar Image detection

1 Answers1