1

I have n observations on which I have computed m clusterings. The clusterings I generated are actually hierarchical divisive, even though they were computed independently. here is a subset of my data:

print(test)

     m_0 m_13000 m_14608 m_16278
   <dbl>   <dbl>   <dbl>   <dbl>
1      1     10    101    1001
2      1     10    101    1002
3      1     11    102    1003
4      1     11    102    1004
5      1     12    103    1005
6      1     12    104    1006
7      2     13    105    1007
8      2     13    106    1008
9      2     13    106    1009
10     2     14    107    1010
..   ...     ...     ...     ...

Each row i = 1:n is an observation, and each column j = 1:m is the membership of the observations based on clustering j. The cluster IDs are unique across the different clustering solutions, i.e. min(test[, j]) > max(test[, j-1]).

The observations are represented as vertices on an igraph graph. I want to turn the test data above into a merge matrix to pass to igraph::make_clusters for further manipulation. What is the best way to do this? I looked at the merge matrix created by this example but I don't really understand it. Can anyone help me out?

mikeck
  • 3,534
  • 1
  • 26
  • 39

2 Answers2

1

My solution ended up being to convert the dataframe to a Newick Tree string using a modified version of the answer to a related SO question about dendrograms and then reading the resulting string into a phylo object using phytools::read.newick, at which point I can convert to an hclust object using ape::as.hclust (if necessary). Not bad!

(slightly edited) solution from the other SO answer

note: these functions don't seem to play nice with tibbles, so use standard data.frames instead.

df2newick <- function(df, innerlabel = FALSE){
  traverse <- function(a, i, innerl){
    if(i < (ncol(df))){
        alevelinner <- as.character(
          unique(df[which(as.character(df[,i]) == a), i + 1])
        )
        desc <- NULL
        for(b in alevelinner) 
          desc <- c(desc, traverse(b, i + 1, innerl))
        il <- NULL
        if(innerl==TRUE) 
          il <- paste0(",", a)
        (newickout <- paste("(", paste(desc,collapse = ","), ")", il, 
          sep=""))
    }
    else { 
      (newickout <- a) 
    }
  }

  alevel <- as.character(unique(df[,1]))
  newick <- NULL
  for(x in alevel) 
    newick <- c(newick, traverse(x, 1, innerlabel))
  (newick <- paste("(", paste(newick, collapse = ","), ");", sep=""))
}

Reproducible example

ex = structure(list(level.1 = c("1", "1", "1", "1", "1", "1", "1", 
"1", "1", "1", "1", "1", "1"), level.2 = c("883", "883", "883", 
"883", "883", "883", "883", "883", "1758", "883", "883", "883", 
"883"), level.3 = c("2293", "2293", "2293", "2293", "2293", "2293", 
"2293", "2293", "3240", "2293", "2293", "2293", "2293"), level.4 = c("3932", 
"3932", "3932", "3932", "3932", "3932", "3932", "3932", "5139", 
"5777", "3932", "3932", "3932"), level.5 = c("6056", "6056", 
"6056", "6056", "6056", "6056", "6056", "6056", "7472", "8110", 
"6056", "6056", "6056"), level.6 = c("8456", "8545", "8949", 
"8456", "8545", "8456", "8545", "8545", "10385", "11023", "8545", 
"8545", "8545"), level.7 = c("11525", "11635", "12084", "12297", 
"12339", "12297", "12339", "12339", "13632", "14270", "12339", 
"12339", "12339"), name = c("A", "B", "C", "D", "E", "F", "G", 
"H", "I", "J", "K", "L", "M")), class = "data.frame", .Names = c("level.1", 
"level.2", "level.3", "level.4", "level.5", "level.6", "level.7", 
"name"), row.names = c(NA, -13L))

treestring = df2newick(ex, innerlabel = FALSE)

library(phytools)
extree = collapse.singles(read.newick(text = treestring))
extree$node.label = head(names(ex), -1)
plot(extree, show.node.label = TRUE)
Community
  • 1
  • 1
mikeck
  • 3,534
  • 1
  • 26
  • 39
1

An alternative (and very easy) solution is to use the data.tree package.

library(data.tree)    
tree = as.Node(ex)
library(ape)
ph = as.phylo(tree)
as.hclust(ph)

However, note that you will need some way to define branch lengths in order to convert to an hclust object. This same constraint applies to my other answer.

mikeck
  • 3,534
  • 1
  • 26
  • 39