1

I read a paper that their retrieval system is based on SIFT descriptor and fast approximate k-means clustering. I installed pyflann. If I am not mistaken the following commands only find the indices of the close datapoints to a specific sample (for example, here, the indices of 5 nearest points from dataset to testset)

from pyflann import *
from numpy import *
from numpy.random import *
dataset = rand(10000, 128)
testset = rand(1000, 128)
flann = FLANN()
result,dists = flann.nn(dataset,testset,5,algorithm="kmeans",
branching=32, iterations=7, checks=16)

I went through user manual, however, could find how can I do k-means clusterin with FLANN. and How can I fit the test based on the cluster centers. As we can use the kmeans++ clustering` in scikitlearn, and then we are fitting the dataset based on the model:

kmeans=KMeans(n_clusters=100,init='k-means++',random_state = 0, verbose=0)
kmeans.fit(dataset)

and later we can assign labels to the test set by using KDTree for example.

kdt=KDTree(kmeans.cluster_centers_)
Q=testset  #query
kdt_dist,kdt_idx=kdt.query(Q,k=1)  #knn
test_labels=kdt_idx  #knn=1 labels

Could someone please help me how can I use the same procedure with FLANN? (I mean clustering the dataset (finding the cluster centers and quantizing features) and then quantizing testset based on cluster centers found from the previous step).

S.EB
  • 1,966
  • 4
  • 29
  • 54

1 Answers1

1

You won't be able to do the best variations with FLANN, because these use two indexes at the same time, and are ugly to implement.

But you can build a new index on the centers for every iteration. But unless you have k > 1000 it probably will not help much.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Thanks for your comment. Could I ask what is the usage of FLANN? and what exactly `flann.nn()` function is doing? I used it for the coordinates (x,y) of points in dataset and test set by this`flann.nn(dataset,testset, 5, algorithm="kdtree")` to obtain the 5 neareset points. Is it correct or my understanding is wrong? – S.EB Mar 12 '18 at 15:04
  • Is it possible that I use FLANN which is set with `kmeans` algorithm and quantize SIFT feature with that, by assigning 2000 as number of clusters? Thanks – S.EB Mar 12 '18 at 15:05
  • Thank you very much for sharing your knowledge. – S.EB Mar 12 '18 at 23:41