0

In hierarchical k-means, the vocabulary tree of depth D, branch factor K should have total number of nodes (excluding the root node) as follows:

nodes = K + K^2 + ... + K^D
nodes = (K^(D+1)-K)/(K-1)  

However, vl_hikmeanshist gives the histogram with one extra bin. On their website, the number of nodes is calculated as:

nodes = (K^(D+1)-1)/(K-1)  

They also say that they "not counting the root which carries no information". So why their formula is different? They do not post their contact on the website so I'm unable to ask them. Can someone shed some light on this matter?

Tu Bui
  • 1,660
  • 5
  • 26
  • 39
  • Hi, Do you know how we can retrieve images after creating the vocabulary tree by `vl_hikmean`? I am wondering how we can get the inverted index file. Thanks – S.EB Apr 04 '18 at 17:40

1 Answers1

1

The root node is not included because it contains no additional information. The root node will always be the mean of the data set. See here

For a simple example, say you have 5 nodes in a tree of depth 2. for your formula you would have (5^3-5)/(4)=120/4=30 nodes (excluding the root node)

Their formula just adds in the root node: (5^3-1)/(4)=124/4=31 nodes. This is the same as the 30 before plus the one root node.

Basically they both mean the same thing. Just know that the extra bin is the root and isn't really useful.

Raab70
  • 721
  • 3
  • 11
  • thank you, but according to the attached link in my question (and your provided link), VLFeat said they already exclude the root node. But their function vl_hikmeanshist still outputs 31 nodes for K=5, D=2. That is why I am asking. Or did they forget to remove the extra bin (i.e. root node) from their output histogram? – Tu Bui Apr 29 '14 at 21:46
  • 1
    I see what you're saying. I guess that's a bit of a typo since it is [open source](https://github.com/vlfeat/vlfeat). – Raab70 Apr 29 '14 at 21:52
  • +1. Hummh, it is a bit weird as both their function and their equation say 31 nodes. And they claim twice (in both links) that they do not count the root node. Anyway, I'm gonna remove one bin from the output of vl_hikmeanshist, but don't know which bin corresponds to the root node? – Tu Bui Apr 29 '14 at 22:12
  • You could use: `rootbin = bins(bins==mean(data));` If this does not produce an exact match you may need something like `rootbin=min(abs(bins-mean(data)));` – Raab70 Apr 29 '14 at 22:15
  • no, I want to remove the bin from the histogram i.e output of vl_hikmeanshist, not the centre of root node in the vocabulary tree (they already remove it). Just tested on a small tree with K=3, D=2 (12 nodes in total). The tree does have 12 nodes, but the histogram is 13-D. After several tests, it turns out the the first bin of the histogram corresponds to the root node, as it always equals number of features (datum) passing into the vl_kmeanshist function. Anyway, thanks for your help. I will wait till VLFeat fixes the bug i.e. your guess about their typo is correct, and accept your answer – Tu Bui Apr 30 '14 at 00:01
  • Thanks, Sorry about the confusion. Finding the histogram bin as the only one with all data in it is the correct method. Best of luck. – Raab70 Apr 30 '14 at 12:27