4

In R, I have df1, df2, and df3 that represent lightning storms. Each df has two columns, 'city' and 'injuries'.

df1 = data.frame(city=c("atlanta", "new york"), injuries=c(5,8))
df2 = data.frame(city=c("chicago", "new york"), injuries=c(2,3))
df3 = data.frame(city=c("los angeles", "atlanta"), injuries=c(1,7))

I want to merge all 3 data frames on a type of outer join on the city column so that all the cities will show up in the combined dataframe and the injury counts will be summed like this:

combined.df

city         df1.freq   df2.freq   df3.freq
atlanta      5          0          7
new york     8          3          0
chicago      0          2          0
los angeles  0          0          1
Andrew Heekin
  • 671
  • 1
  • 6
  • 15
  • You may want to update your question to indicate that you are looking for a solution which works for 8 data frames. – Victor K. Dec 04 '13 at 04:43

4 Answers4

4

This is general to any number of data.frames:

library(functional)
Reduce(Curry(merge, by = "city", all = TRUE), list(df1, df2, df3))
#          city injuries.x injuries.y injuries
# 1     atlanta          5         NA        7
# 2    new york          8          3       NA
# 3     chicago         NA          2       NA
# 4 los angeles         NA         NA        1

However, multiple merges might be slow. Another approach would be to stack your data.frames into a long one:

df.long <- do.call(rbind, Map(transform, list(df1, df2, df3),
                                         name = c("df1", "df2", "df3")))
#          city injuries name
# 1     atlanta        5  df1
# 2    new york        8  df1
# 3     chicago        2  df2
# 4    new york        3  df2
# 5 los angeles        1  df3
# 6     atlanta        7  df3

Then reshape that data using xtabs for example:

xtabs(injuries ~ city + name, df.long)
#              name
# city          df1 df2 df3
#   atlanta       5   0   7
#   new york      8   3   0
#   chicago       0   2   0
#   los angeles   0   0   1

(The reshape function might also be useful for that last step, but I am not super familiar with it.)

flodel
  • 87,577
  • 21
  • 185
  • 223
  • Same case for some reason- works with three, but gives the following error for 4+ data.frames: Error in match.names(clabs, names(xi)) : names do not match previous names – Andrew Heekin Dec 04 '13 at 04:26
  • Might be related to all data.frames having the same "injuries" column name. Not worth fixing, I see you got a lot of good alternatives that do not use `merge`. – flodel Dec 04 '13 at 04:45
2

merge is your friend. Type ?merge for more details.

> merge(merge(df1, df2, by = "city", all = TRUE), df3, by = "city", all = TRUE)
         city injuries.x injuries.y injuries
1     atlanta          5         NA        7
2     chicago         NA          2       NA
3 los angeles         NA         NA        1
4    new york          8          3       NA

Edit. While I like @flodel's solution, here is a more straightforward one, which may be somewhat easier to understand:

 Reduce(function(d1, d2) merge(d1, d2, all = TRUE, by = "city"), list(df1, df2, df3))
Victor K.
  • 4,054
  • 3
  • 25
  • 38
1

Alternative to @flodel's version using the base R reshape function:

dat <- list(df1,df2,df3)
intm <- data.frame(do.call(rbind,dat),val=rep(seq_along(dat),sapply(dat,nrow)))
reshape(intm, idvar="city", timevar="val", direction="wide")

#         city injuries.1 injuries.2 injuries.3
#1     atlanta          5         NA          7
#2    new york          8          3         NA
#3     chicago         NA          2         NA
#5 los angeles         NA         NA          1
thelatemail
  • 91,185
  • 12
  • 128
  • 188
1

Here's my solution using reshape::cast (thanks, @thelatemail!). Add id variable to each data frame, row bind them and then cast to wide format:

df1$id <- 'df1.freq'
df2$id <- 'df2.freq'
df3$id <- 'df3.freq'

rb <- rbind(df1,df2,df3)
library(reshape)
cast(rb, city ~ id, value='injuries')

Result:

         city df1.freq df2.freq df3.freq
1     atlanta        5       NA        7
2    new york        8        3       NA
3     chicago       NA        2       NA
4 los angeles       NA       NA        1
flodel
  • 87,577
  • 21
  • 185
  • 223
Alex Popov
  • 3,726
  • 1
  • 17
  • 20