2

TL;DR: How to use the WeightedCluster library (the wcKMedoids() method in particular) as input to heatmap, heatmap.2 or similar, to provide it with clustering info?


We are creating a heatmap from some binary data (yes/no values, represented as ones and zeros) in R, and need to adjust the weights of some of the rows for the column based clustering.

(they are generated from multi-choice categories into multiple binary yes/no-valued rows, and thus are getting over-represented).

I found the WeightedCluster library, which can do clustering with weights.

Now the question is how to use this library (the wcKMedoids() method in particular) as input to heatmap, heatmap.2 or similar?

I have tried the following code, which results in the error message below:

library(gplots)
library(WeightedCluster)

dataset <- "
F,T1,T2,T3,T4,T5,T6,T7,T8
A,1,1,0,1,1,1,1,1
B,1,0,1,0,1,0,1,1
C,1,1,1,1,1,1,1,0
D,1,1,1,0,1,1,1,0
E,0,1,0,0,1,0,1,0
F,0,0,1,0,0,0,0,0
G,1,1,1,0,1,1,1,1
H,1,1,0,0,0,0,0,0
I,1,0,1,0,0,1,0,0
J,1,1,1,0,0,0,0,1
K,1,0,0,0,1,1,1,1
L,1,1,1,0,1,1,1,1
M,0,1,1,1,1,1,1,1
N,1,1,1,0,1,1,1,1"
fakefile <- textConnection(dataset)

d <- read.csv(fakefile, header=T, row.names = 1)

weights <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1)

distf <- function(x) dist(x, method="binary")
wclustf <- function(x) wcKMedoids(distf(x), 
                                 k=8, 
                                 weights=weights, 
                                 npass = 1, 
                                 initialclust=NULL, 
                                 method="PAMonce", 
                                 cluster.only = FALSE, 
                                 debuglevel=0)

cluster_colors <- colorRampPalette(c("red", "green"))(256);
heatmap(as.matrix(d), 
        col=cluster_colors,
        distfun = distf,
        hclustfun = wclustf,
        keep.dendro = F,
        margins=c(10,16),
        scale="none")

But running it gives:

Error in UseMethod("as.dendrogram") : 
  no applicable method for 'as.dendrogram' applied to an object of class "c('kmedoids', 'list')"

Apparently, wcKMedoids is not a drop-in replacement for R's hclust, but does anyone have some pointers on how to work around that?

UPDATE: The tiny progress I have made so far indicates that I should implement a method as.dendrogram.kmedoids, that produces a similar output as hclust(dist(x)). (Its output can be inspected in detail with dput: dput(hclust(dist(x)))). Ideas and pointers much welcome.

Samuel Lampa
  • 4,336
  • 5
  • 42
  • 63

2 Answers2

1

If you can make do with a simpler solution, this just multiply the weights into the original matrix, giving them larger weights this way. I'm not 100% sure that this is the statistically correct way to do this, but depending on what you want to achieve it might do the job.

# Create the dataset
dataset <- matrix(
  dimnames = list(LETTERS[seq( from = 1, to = 14 )], c("T1","T2","T3","T4","T5","T6","T7","T8")),
  data = c(1,1,0,1,1,1,1,1,
           1,0,1,0,1,0,1,1,
           1,1,1,1,1,1,1,0,
           1,1,1,0,1,1,1,0,
           0,1,0,0,1,0,1,0,
           0,0,1,0,0,0,0,0,
           1,1,1,0,1,1,1,1,
           1,1,0,0,0,0,0,0,
           1,0,1,0,0,1,0,0,
           1,1,1,0,0,0,0,1,
           1,0,0,0,1,1,1,1,
           1,1,1,0,1,1,1,1,
           0,1,1,1,1,1,1,1,
           1,1,1,0,1,1,1,1),
  ncol=8,
  nrow=14)

# Assign weights to the different columns
col.weights <- c(2,3,1,1,1,1,1,1)

# Transform the original matrix with the weights
# you want to assign to each column.
create.weights.matrix <- function(weights, rows) {
  sapply(weights, function(x){rep(x, rows)})
}
weights.matrix <- create.weights.matrix(col.weights, nrow(dataset))
d.weighted <- weights.matrix * dataset

# Create the plot
cluster_colors <- colorRampPalette(c("red", "green"))(256);
heatmap(as.matrix(d.weighted), 
        col=cluster_colors,
        keep.dendro = F,
        margins=c(10,16),
        scale="none")

This will give you something like this as a result:

heatmap with weights

Johan
  • 689
  • 7
  • 17
0

This cannot be done. K-Medoid clustering is a partioning method, not a hierarchical one. Dendogram is only meaningful for hierarchical clustering algorithms.

Matthias Studer
  • 1,722
  • 1
  • 10
  • 24