3

I'm trying to simultaneously merge several data frames in r by combining them into lists and using mapply and merge. This works fine when merging on a single key, but not when merging on multiple keys using the 'by = c("a","b")' argument in merge.

Mapply appears to be treating the by argument as a third list to cycle through, rather than passing the whole argument through every time.

For instance this code:

df <- data.frame(a = rep(1:3, 3),
                 b = c(rep("aa",3), rep("bb",3), rep("cc",3)),
                 c = rnorm(9, mean = 5))
df2 <- df1 <- df 
list1 <- list(df, df1, df2)
df3 <- data.frame(a = rep(3:1, 3),
                  b = c(rep("cc",3), rep("bb",3), rep("aa",3)),
                  c = rnorm(9, mean = -5))
df5 <- df4 <- df3 
list2 <- list(df3, df4, df5)

list3 <- mapply(merge, list1, list2, SIMPLIFY = FALSE, by = c("a","b"))

returns the warning message "longer argument not a multiple of length of shorter". But when I add a third term 'by = c("a","b","c")' it attempts to merge the first data frame by "a", the second by "b", and the third by "c". What I want, however, is to merge them all by both a and b. Does anyone know how I might do this?

nrussell
  • 18,382
  • 4
  • 47
  • 60
John Clegg
  • 99
  • 8
  • 3
    See the `MoreArgs` argument in `?mapply` - `MoreArgs = list(by = c("a","b"))` – nrussell Jun 13 '16 at 18:01
  • If every key value corresponds to exactly one row in each table, you can just sort all the tables and `cbind` relevant cols together, `do.call(cbind, c(list2[[1]][, 1:2], lapply(list2, \`[\`, -(1:2))))` – Frank Jun 13 '16 at 18:06
  • 1
    @nrussell Thanks that answers my question! – John Clegg Jun 13 '16 at 18:16

0 Answers0