I'm playing with some dummy data to test clustering based in correlation distance (pearson). I'm calculating pearson correlation two ways, one inside the pheatmap
function and other outside using cor()
, both ways I would expect to retrieve same results.
When playing with dummy random generated data most of the time the clusters are identical, but sometimes the differ enough to notice difference between clusters (i.e. samples in different clusters between executions).
What I I'm missing? Why clusters are not always the same?
I'm only interest in clustering columns.
# Random matrix generation
x <- rep(c(1:100, 100:1), 100)
mtx <- matrix(x, ncol = 10, nrow = 100)
noise <- matrix(rnorm(10 * 100), ncol = 10, nrow = 100)
mtx <- mtx * noise
rownames(mtx) <- paste("Genes_", 1:100, sep = "")
colnames(mtx) <- paste("Patient_", 1:10, sep = "")
library("pheatmap")
# Clustering based on pheatmap correlation = pearson (execution type 1)
pheatmap(mtx,
clustering_distance_cols = "correlation",
clustering_method = "complete",
show_rownames = FALSE,
show_colnames = TRUE,
main = "correlation",
cluster_rows = FALSE)
# Calculate correlation prior pheatmap function (execution type 2)
pheatmap(cor(mtx, method = "pearson"),
clustering_distance_rows = "none",
clustering_method = "complete",
show_rownames = FALSE,
show_colnames = TRUE,
main = "pearson",
cluster_rows = FALSE)