2

I'm trying to understand a bit more my data doing some clustering analysis. Using the same data, I've done first a hclust with this code:

# Dissimilarity matrix
df <-scale(m.sel)
d <- dist(df, method = "euclidean")
# Hierarchical clustering using Ward's method
res.hc <- hclust(d, method = "ward.D2")

I have two clear clusters. So, great.

Then I do a heat map.2 (I've got genetic data) using same data again and same settings. This is the code:

dist.method <- function(x) dist(x, method="euclidean")
hclust.method <- function(x) hclust(x, method="ward.D2")
heatmap.2(scale(m.sel),  dendrogram = "col", distfun = dist.method, hclustfun = hclust.method, trace = "none", col=bluered, margins = c(6, 9), cexCol = 1, density.info = "none", ColSideColors = colors)

The thing is that I don't get two clear clusters at all like using just the hclust. Any body could point me in the direction of why is that? The only similar question I found was this Understanding heatmap dendogram clustering in R but it's about two commands to do heat maps and there were differences in the settings but my understanding is that I'm putting the same setting so I should expect the same? Is there something really obvious I'm missing?

Also, the only way I got two clusters was changing in the heatmap.2, the method in the distance function for "manhattan" and then hclust for "complete", however, some samples were still in different clusters than when I did just the hclust. What's the reason for that? What should I trust then?

I'm sorry I cannot put the real data to show as it's kind of confidential so you'd need to believe that I don't get the same results :/

Any help or enlightenment would be much appreciated! Thanks!

--EDIT--

So I have tried with the iris data and although the clusters are similar, they are not exactly the same. Here is the code for the hclust:

d <- dist(scale(iris[, 1:4]), method="euclidean")
# Hierarchical clustering using Ward's method
res.hc <- hclust(d, method = "ward.D2" )
# Plot the obtained dendrogram
plot(res.hc, cex = 0.6, hang = -1)

And here is the one for the heatmap.2:

data(iris)
m.sel <- as.matrix(t(iris[,1:4]))
dist.method <- function(x) dist(x, method="euclidean")
hclust.method <- function(x) hclust(x, method="ward.D2")
heatmap.2(scale(m.sel),  dendrogram = "col", distfun = dist.method, hclustfun = hclust.method, trace = "none", col=bluered, margins = c(3, 10), cexCol = 0.5, density.info = "none")

I know this is not the best example ever, in my case, clusters are even more different...

Community
  • 1
  • 1
  • 1
    If you cannot show the real data, perhaps you could simulate some data with similar properties, and thereby make the problem reproducible – Richard Telford May 02 '16 at 19:13
  • I wish I could simulate similar properties, I am actually trying to get to know the data :) - However, I tried with the standard iris data and still got slightly different clusters. What I want to know it's if there is an explanation about it, I guess more in a theoretical level or within the functions themselves? Why do I get different results with the same settings? – Fabiola Fernández May 02 '16 at 19:45
  • dput(data) small set really helps to trouble shoot – kcm Jul 08 '22 at 20:38

0 Answers0