7

I have a list containing 5 matrices, each of different size and I would like to merge all of them using the row names.

Here is a reproducible example of my list (I am using igraph_0.6.5-2 on R version 3.0.1):

x <- list(    
as.matrix(c(1,4)),
as.matrix(c(3,19,11)),
as.matrix(c(3,9,8,5)),
as.matrix(c(3,10,8,87,38,92)),
as.matrix(c(87,8,8,87,38,92))  
)   

colnames(x[[1]]) <- c("P1")  
colnames(x[[2]]) <- c("P2")  
colnames(x[[3]]) <- c("P3")  
colnames(x[[4]]) <- c("P4")  
colnames(x[[5]]) <- c("P5")  
rownames(x[[1]]) <- c("A","B")   
rownames(x[[2]]) <- c("B","C","D")  
rownames(x[[3]]) <- c("A","B", "E", "F")  
rownames(x[[4]]) <- c("A","F","G","H","I","J" )  
rownames(x[[5]]) <- c("B", "H","I","J", "K","L")  

which gives me the following list:

> x
[[1]]
  P1
A  1
B  4
[[2]]
  P2
B  3
C 19
D 11
[[3]]
  P3
A  3
B  9
E  8
F  5
[[4]]
  P4
A  3
F 10
G  8
H 87
I 38
J 92
[[5]]
  P5
B 87
H  8
I  8
J 87
K 38
L 92

I would like to obtain something like this:

>   P1  P2  P3  P4  P5 
A    1  na   3   3  na 
B    4   3   9  na  87 
C   na  19  na  na  na 
D   na  11  na  na  na 
E   na  na   8  na  na 
F   na  na   5  10  na 
G   na  na  na   8  na 
H   na  na  na  87  na 
I   na  na  na  38   8 
J   na  na  na  92  87 
K   na  na  na  na  38 
L   na  na  na  na  92 

Merging them using the do.call function:

y <- do.call(merge,c(x, by="row.names",all=TRUE))

gives me the following error:

Error in fix.by(by.x, x) : 'by' must match numbers of columns

Any help is greatly appreciated. Thanks!

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Charlie
  • 109
  • 1
  • 5

1 Answers1

8

I would create a helper function to move your row.names() to a column in a data.frame, and use Reduce() to merge() all the data.frames in your list:

rownames2col <- function(inDF, RowName = ".rownames") {
  temp <- data.frame(rownames(inDF), inDF, row.names = NULL)
  names(temp)[1] <- RowName
  temp
}

Reduce(function(x, y) merge(x, y, by = ".rownames", all = TRUE), 
       lapply(x, rownames2col))
#    .rownames P1 P2 P3 P4 P5
# 1          A  1 NA  3  3 NA
# 2          B  4  3  9 NA 87
# 3          C NA 19 NA NA NA
# 4          D NA 11 NA NA NA
# 5          E NA NA  8 NA NA
# 6          F NA NA  5 10 NA
# 7          G NA NA NA  8 NA
# 8          H NA NA NA 87  8
# 9          I NA NA NA 38  8
# 10         J NA NA NA 92 87
# 11         K NA NA NA NA 38
# 12         L NA NA NA NA 92

The reason for the added step of bringing the rownames() in as a column is that merging by row.names creates a column called Row.names on the first merge() in Reduce(), thus not allowing the subsequent list() items to be conveniently merged.

> Reduce(function(x, y) merge(x, y, by = "row.names", all = TRUE), x[1:2])
  Row.names P1 P2
1         A  1 NA
2         B  4  3
3         C NA 19
4         D NA 11

Update: A data.table approach

A very similar concept can be used with data.table by setting the keep.rownames argument as "TRUE" and setting the key to the resulting "rn" column.

library(data.table)
Reduce(function(x, y) merge(x, y, all = TRUE), 
       lapply(x, function(y) data.table(y, keep.rownames=TRUE, key = "rn")))
#     rn P1 P2 P3 P4 P5
#  1:  A  1 NA  3  3 NA
#  2:  B  4  3  9 NA 87
#  3:  C NA 19 NA NA NA
#  4:  D NA 11 NA NA NA
#  5:  E NA NA  8 NA NA
#  6:  F NA NA  5 10 NA
#  7:  G NA NA NA  8 NA
#  8:  H NA NA NA 87  8
#  9:  I NA NA NA 38  8
# 10:  J NA NA NA 92 87
# 11:  K NA NA NA NA 38
# 12:  L NA NA NA NA 92

Update 2: A "manual" approach

There is, of course, the manual approach, assisted by a for loop. This might actually be faster than the above because merge is pretty slow in comparison to basic subsetting. Another advantage with respect to speed is that your resulting object is a matrix and many matrix operations are faster than data.frame operations.

## Identify the unique "rownames" for all list items
Rows <- unique(unlist(lapply(x, rownames)))

## Create a matrix of NA values 
##   with appropriate dimensions and dimnames
myMat <- matrix(NA, nrow = length(Rows), ncol = length(x), 
                dimnames = list(Rows, sapply(x, colnames)))


## Use your `for` loop to fill it in
##   with the appropriate values from your list
for (i in seq_along(x)) {
  myMat[rownames(x[[i]]), i] <- x[[i]]
}
myMat
#   P1 P2 P3 P4 P5
# A  1 NA  3  3 NA
# B  4  3  9 NA 87
# C NA 19 NA NA NA
# D NA 11 NA NA NA
# E NA NA  8 NA NA
# F NA NA  5 10 NA
# G NA NA NA  8 NA
# H NA NA NA 87  8
# I NA NA NA 38  8
# J NA NA NA 92 87
# K NA NA NA NA 38
# L NA NA NA NA 92
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485