I would like to export an hclust-dendrogram from R into a data table in order to subsequently import it into another ("home-made") software. str(unclass(fit))
provides a text overview for the dendrogram, but what I'm looking for is really a numeric table. I've looked at the Bioconductor ctc package, but the output it's producing looks somewhat cryptical. I would like to have something similar to this table: http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm
Is there a way to get this out of an hclust object in R?
Asked
Active
Viewed 6,425 times
5
2 Answers
3
In case anyone is also interested in dendrogram export, here is my solution. Most probably, it's not the best one as I started using R only recently, but at least it works. So suggestions on how to improve the code are welcome.
So, ifhr
is my hclust object and df
is my data, the first column of which contains a simple index starting from 0, and the row names are the names of the clustered items:
# Retrieve the leaf order (row name and its position within the leaves)
leaf.order <- matrix(data=NA, ncol=2, nrow=nrow(df),
dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hr$labels[hr$order]
for (i in 1:nrow(leaf.order)) {
leaf.order[which(leaf.order[,2] %in% rownames(df[i,])),1] <- df[i,1]
}
leaf.order <- as.data.frame(leaf.order)
hr.merge <- hr$merge
n <- max(df[,1])
# Re-index all clustered leaves and nodes. First, all leaves are indexed starting from 0.
# Next, all nodes are indexed starting from max. index leave + 1.
for (i in 1:length(hr.merge)) {
if (hr.merge[i]<0) {hr.merge[i] <- abs(hr.merge[i])-1}
else { hr.merge[i] <- (hr.merge[i]+n) }
}
node.id <- c(0:length(hr.merge))
# Generate dendrogram matrix with node index in the first column.
dend <- matrix(data=NA, nrow=length(node.id), ncol=6,
dimnames=list(c(0:(length(node.id)-1)),
c("node.id", "parent.id", "pruning.level",
"height", "leaf.order", "row.name")) )
dend[,1] <- c(0:((2*nrow(df))-2)) # Insert a leaf/node index
# Calculate parent ID for each leaf/node:
# 1) For each leaf/node index, find the corresponding row number within the merge-table.
# 2) Add the maximum leaf index to the row number as indexing the nodes starts after indexing all the leaves.
for (i in 1:(nrow(dend)-1)) {
dend[i,2] <- row(hr.merge)[which(hr.merge %in% dend[i,1])]+n
}
# Generate table with indexing of all leaves (1st column) and inserting the corresponding row names into the 3rd column.
hr.order <- matrix(data=NA,
nrow=length(hr$labels), ncol=3,
dimnames=list(c(), c("order.number", "leaf.id", "row.name")))
hr.order[,1] <- c(0:(nrow(hr.order)-1))
hr.order[,3] <- t(hr$labels[hr$order])
hr.order <- data.frame(hr.order)
hr.order[,1] <- as.numeric(hr.order[,1])
# Assign the row name to each leaf.
dend <- as.data.frame(dend)
for (i in 1:nrow(df)) {
dend[which(dend[,1] %in% df[i,1]),6] <- rownames(df[i,])
}
# Assign the position on the dendrogram (from left to right) to each leaf.
for (i in 1:nrow(hr.order)) {
dend[which(dend[,6] %in% hr.order[i,3]),5] <- hr.order[i,1]-1
}
# Insert height for each node.
dend[c((n+2):nrow(dend)),4] <- hr$height
# All leaves get the highest possible pruning level
dend[which(dend[,1] <= n),3] <- nrow(hr.merge)
# The nodes get a decreasing index starting from the pruning level of the
# leaves minus 1 and up to 0
for (i in (n+2):nrow(dend)) {
if ((dend[i,4] != dend[(i-1),4]) || is.na(dend[(i-1),4])){
dend[i,3] <- dend[(i-1),3]-1}
else { dend[i,3] <- dend[(i-1),3] }
}
dend[,3] <- dend[,3]-min(dend[,3])
dend <- dend[order(-node.id),]
# Write results table.
write.table(dend, file="path", sep=";", row.names=F)

AnjaM
- 2,941
- 8
- 39
- 62
-
1I just used this code and it worked perfectly. The big difficulty for me? Reading the directions about what input data was required - that description of the data frame "df" is actually important, folks. – eleanorahowe Feb 14 '13 at 15:44
-
@Eleanor I'm happy that you found it useful. You're right, the code relies on a particular structure of the input data frame. I hope you didn't spend too much time with figuring it out. – AnjaM Feb 15 '13 at 08:20
-
R is a 1-indexed language but this code appears to have been written with 0-indexed based loops. Exercise caution when using it as there is potential for off-by-one errors. – Tom Kelly Dec 25 '18 at 03:05
1
There is package that does exactly opposite of what you want - Labeltodendro ;-)
But seriously, can't you just manually extract the elements from hclust
object (e.g. $merge
, $height
, $order
) and create custom table from the extracted elements?

Geek On Acid
- 6,330
- 4
- 44
- 64