2

I am using R dendextend package to plot hclust tree objects generated by each hclust method from hclust{stats}: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

I notice the color coding for color_branches fails when I use method = "median" or "centroid".

I tested it with randomly generated matrix and error is replicated for the "median" and "centroid" methods, is there a specific reason for this?

Please see the link for the output plots: fig1. hclust methods (a) ward.D2, (b) median, (c) centroid

library(dendextend)
set.seed(1)
df <- as.data.frame(replicate(10, rnorm(20)))
df.names <- rep(c("black", "red", "blue", "green", "cyan"), 2)
df.col <- rep(c("black", "red", "blue", "green", "cyan"), 2)
colnames(df) <- df.names
df.dist <- dist(t(df), method = "euclidean")

# plotting works for "ward.D", "ward.D2", "single", "complete", "average", "mcquitty"
dend <- as.dendrogram(hclust(df.dist, method = "ward.D2"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

# color_branches fails for "median" or "centroid"
dend <- as.dendrogram(hclust(df.dist, method = "median"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

dend <- as.dendrogram(hclust(df.dist, method = "centroid"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

I am using dendextend_1.4.0. Session Info below:

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.3

Thanks.

lmo
  • 37,904
  • 9
  • 56
  • 69
Zach Roe
  • 23
  • 4
  • It works fine for me, what is your exact output, please paste it. – Tal Galili Feb 23 '17 at 07:21
  • ok, I now see what you mean. The issue is that this code produces clusters with tree heights which are "weird". In such a case it is not clear to me how to address it, since the meaning of a "cut" is not clear. – Tal Galili Feb 24 '17 at 19:30
  • Hi Tal, yes I suspected it had something to do with the "weird" tree heights my data generated but since I was able to reproduce it in a random matrix I was curious if it's related to the cluster methods -- if these methods have tendency to generate these types of trees. The color coding for the labels works... Is there a way for me to edit the code to flag when a cut is not clear and assign the color of the branches based on the label order? – Zach Roe Feb 24 '17 at 22:06
  • I gave an example of how to deal with it, but it's not "pretty". – Tal Galili Feb 25 '17 at 16:47

1 Answers1

2

You can solve this issue using branches_attr_by_clusters (although it could get a bit tricky, see the example below):

library(dendextend)
set.seed(1)
df <- as.data.frame(replicate(10, rnorm(20)))
df.names <- rep(c("black", "red", "blue", "green", "cyan"), 2)
df.col <- rep(c("black", "red", "blue", "green", "cyan"), 2)
colnames(df) <- df.names
df.dist <- dist(t(df), method = "euclidean")

# plotting works for "ward.D", "ward.D2", "single", "complete", "average", "mcquitty"
dend <- as.dendrogram(hclust(df.dist, method = "ward.D2"), labels = df.names)
labels_colors(dend) <- df.col[order.dendrogram(dend)]
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

# color_branches fails for "median" or "centroid"
dend <- as.dendrogram(hclust(df.dist, method = "median"), labels = df.names)
aa <- df.col[order.dendrogram(dend)]
labels_colors(dend) <- aa
dend.colorBranch <- color_branches(dend, k = length(df.names), col = df.col[order.dendrogram(dend)])
dend.colorBranch %>% set("branches_lwd", 3) %>% plot(horiz = TRUE)

aa <- factor(aa, levels = unique(aa))
dend %>% branches_attr_by_clusters(aa, value = levels(aa)) %>% plot

enter image description here

Tal Galili
  • 24,605
  • 44
  • 129
  • 187