0

I have some questions regarding the Baker's Gamma and FM indices in the dendextend package.

  1. What is the interpretation of the Baker's Gamma distribution under H0? i.e. when do you reject the null hypothesis?
  2. What is the difference between cor_FM_index and FM_index? The expectation and variance seems to stay the same but not the index value.
  3. The Bk plot shows the FM index over different values of k. What can be concluded from such a plot?
Tal Galili
  • 24,605
  • 44
  • 129
  • 187
Ali
  • 1,048
  • 8
  • 19

1 Answers1

1
  1. The H0 is that there is no correlation between how "high" two items merge in one dend vs that value in the other dend. If two dends are equal then for each two leaves you will look at, the height of the branch in which they merge will be identical, so their baker's gamma (the correlation over all such pairs) will be 1. If the two trees are completely dissimilar, then their correlation will be close to 0. Something significant in between means that there is some type of similarity. Generally, that the more two leaves are "close" in one dend, so will they be close in the other. As with any correlation, the exact meaning in borderline cases cannot be inferred just by the cor value.

  2. cor_FM_index uses FM_index, but does so in the "correct" way. Look at the code of cor_FM_index to see how.

  3. It can show at which level of cutting the two trees they resemble each other. For example, if you had two trees (t1 and t2), each with two sub-families that includes the exact same items, then their Bk (k=2) would be 1. But it could be that when you cut these trees with k=3, their subtrees would no longer include the exact same items in t1 and t2. Hence, it is a measure of tree similarity at different levels of cutting the trees. If the trees are identical, it should be Bk=1 all the way. If they are similar in some heights, these Bk values would be significant.

I hope this helps, thanks for the good questions.

Tal Galili
  • 24,605
  • 44
  • 129
  • 187
  • thank you for your answer. As a small extension to my first question. How would you decide whether to reject the null hypothesis using the permutation test? i.e. from the Baker's gamma distribution under H0 plot. In your example you conclude there is not enough evidence the two dendrograms are significantly similar. My two dendrograms have a correlation of 0.9 (lets suppose), but the cor and cor2 values on the x-axis show a value of 0. – Ali Aug 27 '18 at 13:30
  • I didn't understand your question. But if Baker's gamma is 0.9 and significant, it means that the two trees are similar beyond what is likely to be expected through a random shuffling of their leaves. – Tal Galili Sep 01 '18 at 21:52