I read a paper that their retrieval system is based on SIFT descriptor and fast approximate k-means clustering. I installed pyflann
. If I am not mistaken the following commands only find the indices of the close datapoints to a specific sample (for example, here, the indices of 5 nearest points from dataset
to testset
)
from pyflann import *
from numpy import *
from numpy.random import *
dataset = rand(10000, 128)
testset = rand(1000, 128)
flann = FLANN()
result,dists = flann.nn(dataset,testset,5,algorithm="kmeans",
branching=32, iterations=7, checks=16)
I went through user manual, however, could find how can I do k-means
clusterin with FLANN
. and How can I fit the test based on the cluster centers. As we can use the kmeans++
clustering` in scikitlearn, and then we are fitting the dataset based on the model:
kmeans=KMeans(n_clusters=100,init='k-means++',random_state = 0, verbose=0)
kmeans.fit(dataset)
and later we can assign labels to the test set by using KDTree
for example.
kdt=KDTree(kmeans.cluster_centers_)
Q=testset #query
kdt_dist,kdt_idx=kdt.query(Q,k=1) #knn
test_labels=kdt_idx #knn=1 labels
Could someone please help me how can I use the same procedure with FLANN
? (I mean clustering the dataset (finding the cluster centers and quantizing features) and then quantizing testset based on cluster centers found from the previous step).