6

I'm working with BOW object detection and I'm working on the encoding stage. I have seen some implementations that use kd-Tree in the encoding stage, but most writings suggest that K-means clustering is the way to go.

What is the difference between the two?

Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
mugetsu
  • 4,228
  • 9
  • 50
  • 79

3 Answers3

6

In object detection, k-means is used to quantize descriptors. A kd-tree can be used to search for descriptors with or without quantization. Each approach has its pros and cons. Specifically, kd-trees are not much better than brute-force search when the number of descriptor dimensions exceeds 20.

Don Reba
  • 13,814
  • 3
  • 48
  • 61
  • i'm using SIFT descriptors, 128 dimensions, so I guess in my encoding phase I should only be quantizing with k-means? – mugetsu Jun 18 '12 at 22:03
  • 1
    I have had achieved great performance using just hierarchical k-means clustering with vocabulary trees and brute-force search at each level. If I needed to further improve performance, I would have looked into using either locality-sensitive hashing or kd-trees combined with dimensionality reduction via PCA. – Don Reba Jun 18 '12 at 23:11
  • I'm recommending on FLANN. It will do the analysis for you and give you the best algorithm that fits your specific data set and memory / performance needs. See http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN – rkellerm Jun 02 '13 at 11:21
  • @DonReba, I would like to do hierarchical k-means. What software did you use to do this? – TyanTowers May 19 '16 at 08:15
  • @TyanTowers, just OpenCV for `k-means`. Did the rest myself in C++. – Don Reba May 19 '16 at 08:25
5

kd-tree AFAIK is used for the labeling phase, its much faster, when clustering over a large number of groups, hundreds if not thousands, then the naive approach of simply taking the argmin of all the distances to each group, k-means http://en.wikipedia.org/wiki/K-means_clustering is the actual clustering algorithm, its fast though not always very precise, some implementations return the groups, while others the groups and the labels of the training data set, this is what I ussually use http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.html in conjunction with http://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.vq.kmeans2.html

Samy Vilar
  • 10,800
  • 2
  • 39
  • 34
  • so just to clarify, you would use kmeans to quantize your image descriptors. Then you would make a kdtree out of those descriptors so that you can search for the closest neighbor in object recognition? – mugetsu Jun 18 '12 at 22:07
  • @mugetsu `Then you would make a kdtree out of those descriptors` pretty much, I've done some benchmarks and well kdtree blows all my optimizations out of the water when working with really large number of groups ... I recommend you simply run some tests :) – Samy Vilar Jun 18 '12 at 22:11
  • so by using kdtree, do you skip having histograms and SVMs? I'm confused on how this works. http://stackoverflow.com/questions/11091972/implementing-bags-of-words-object-recognition-using-vlfeat – mugetsu Jun 18 '12 at 22:27
  • @mugetsu check out http://www.cs.brown.edu/courses/cs143/results/proj3/sungmin/ I couldn't find an easier tutorial ... – Samy Vilar Jun 18 '12 at 23:05
  • 1
    Updating above link: http://cs.brown.edu/courses/cs143/2011/results/proj3/sungmin/ – saurabheights Dec 19 '15 at 06:08
2

kd-Tree and K-means algorithm are two different types of clustering method.

Here are several types of clustering method as follows:

  • kd-Tree is a method (median-based).
  • K-means is a means-based clustering method.
  • GMM (Gaussian mixture model) is a probability-based clustering method (soft-clustering).
  • etc.

[UPDATE]:

Generally, there are two types of clustering method, soft clustering, and hard clustering. Probabilistic clustering like the GMM are soft clustering type with assigning objects to the clusters with probabilities, and others are hard clustering with assigning objects to a cluster absolutely.

Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
  • There are many more clustering methods than these three. GMM is not really a clustering method, though I guess you could use it as such. K-means does not use standard deviation at all, it's based on means and Voronoi tessellation. – Cris Luengo Aug 16 '19 at 15:05
  • yes of course, there are many more clustering methods. in kmeans, objects select by minimum standard deviation in each cluster with its computed means, so I mentioned standard deviation too. and gmm could be as a clustering method, for example with three gaussian distributions which objects belong to each them with comparing their probabilty like the kmeans with three means. – Benyamin Jafari Aug 16 '19 at 15:39
  • I just questioned the usefulness of mentioning one other clustering algorithm when the question is "what is the difference between methods A and B", considering there exist so many clustering algorithms, and [there already exist lists that try to collect them all](https://en.wikipedia.org/wiki/Category:Cluster_analysis_algorithms). – Cris Luengo Aug 16 '19 at 15:45
  • Regarding k-means: objects are selected by minimum distance to each mean (as in Voronoi tessellation), not by standard deviation. The standard deviation is never computed or implied. – Cris Luengo Aug 16 '19 at 15:45
  • I didn't want to say the std-dev being calculated (I wrote incorrectly). My mean was that means calculated and in the obtained cluster, each object has minimum variance/std-dev across other objects. [And this is about the GMM in O'Reilly book about unsupervised learning.](https://i.stack.imgur.com/NCoLd.png) – Benyamin Jafari Aug 16 '19 at 17:38