0

I'm playing with some dummy data to test clustering based in correlation distance (pearson). I'm calculating pearson correlation two ways, one inside the pheatmap function and other outside using cor(), both ways I would expect to retrieve same results.

When playing with dummy random generated data most of the time the clusters are identical, but sometimes the differ enough to notice difference between clusters (i.e. samples in different clusters between executions).

What I I'm missing? Why clusters are not always the same?

I'm only interest in clustering columns.

# Random matrix generation
x <- rep(c(1:100, 100:1), 100)
mtx <- matrix(x, ncol = 10, nrow = 100)
noise <- matrix(rnorm(10 * 100), ncol = 10, nrow = 100)
mtx <- mtx * noise

rownames(mtx) <- paste("Genes_", 1:100, sep = "")
colnames(mtx) <- paste("Patient_", 1:10, sep = "")


library("pheatmap")

# Clustering based on pheatmap correlation = pearson (execution type 1)
pheatmap(mtx,
         clustering_distance_cols = "correlation",
         clustering_method = "complete",
         show_rownames = FALSE,
         show_colnames = TRUE,
         main = "correlation",
         cluster_rows = FALSE)

# Calculate correlation prior pheatmap function (execution type 2)
pheatmap(cor(mtx, method = "pearson"),
         clustering_distance_rows = "none",
         clustering_method = "complete",
         show_rownames = FALSE,
         show_colnames = TRUE,
         main = "pearson",
         cluster_rows = FALSE)
Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
HeyHoLetsGo
  • 137
  • 1
  • 14
  • You are using random inputs. Why are you expecting the same clusters? put `set.seed(123)` at the beginning of your code and you should get the same results all the time. – Reza Aug 11 '20 at 17:46
  • I'm expecting the same result as I use the same matrix `mtx`. The differences apear between the running `pheatmap` directly wiht `mtx` or with the output of `cor()` – HeyHoLetsGo Aug 11 '20 at 20:32

0 Answers0