Merge two data frames to get alternate rows of each data frame in sequence

Question

My data.frame DATA is

  k    l   g
1 A 2004  12
2 B 2004 3.4
3 C 2004 4.5

Another data.frame DATA2 is

  i    d   t
1 A 2012  22
2 B 2012 4.8
3 C 2012 5.6

I want to get

1 A 2004  12
1 A 2012  22
2 B 2004 3.4
2 B 2012 4.8
3 C 2004 4.5
3 C 2012 5.6

akrun · Accepted Answer · 2016-02-12T07:34:42.993

6

We can try rbindlist from data.table. Place the datasets in a list, rbind them with rbindlist and order by the first column.

library(data.table)
rbindlist(list(df1, df2))[order(k)]
#   k    l    g
#1: A 2004 12.0
#2: A 2012 22.0
#3: B 2004  3.4
#4: B 2012  4.8
#5: C 2004  4.5
#6: C 2012  5.6

Or using dplyr

library(dplyr)
bind_rows(df1, setNames(df2, names(df1))) %>% 
           arrange(k)

NOTE: I used df1 and df2 in place of DATA and DATA2 as object names as it is easier to type.

edited Feb 12 '16 at 07:34

answered Feb 12 '16 at 07:28

akrun

874,273
37
540
662

Would this still work if dataframes have different colnames? – zx8754 Feb 12 '16 at 07:32
@A.M.G16 Why did you unaccept my answer? I think my solution works just as well the dplyr one. – h3rm4n Feb 12 '16 at 15:35
@h3rm4n Your solution is just similar to the one I showed with `rbindlist` (original idea) – akrun Feb 12 '16 at 17:54
I might be wrong, but I think the `rbind` function was developed before the `rbindlist` function. You don't have to be so protective (refering to original idea), as both concepts already exist for quite some time. And I think it is legitimate to ask why someone unaccepted an answer. I am trying to learn something here and therefore like to know whether there is something wrong in my answer. – h3rm4n Feb 14 '16 at 08:11

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2016-02-12T08:12:50.973

You can try the interleave function from the "gdata" package. However, this would require your inputs have the same column names and have the same number of rows.

The approach would be:

library(gdata)      # for interleave
do.call(interleave, lapply(list(df1, df2), setNames, paste0("V", 1:ncol(df1))))
#    V1   V2   V3
# 1   A 2004 12.0
# 11  A 2012 22.0
# 2   B 2004  3.4
# 21  B 2012  4.8
# 3   C 2004  4.5
# 31  C 2012  5.6

Alternatively, as mentioned in my comment @akrun's answer, depending on whether the first column is a grouping variable or not, you may want to modify his approach a little.

For instance, imagine there were a third data.frame, with a different number of rows than the others. interleave would not work on that, but the rbindlist approach would.

df3 <- do.call(rbind, lapply(list(df1, df2), setNames, c("A", "B", "Z")))

rbindlist(list(df1, df2, df3), idcol = TRUE)[, N := sequence(.N), by = .id][order(N)]
#     .id k    l    g N
#  1:   1 A 2004 12.0 1
#  2:   2 A 2012 22.0 1
#  3:   3 A 2004 12.0 1
#  4:   1 B 2004  3.4 2
#  5:   2 B 2012  4.8 2
#  6:   3 B 2004  3.4 2
#  7:   1 C 2004  4.5 3
#  8:   2 C 2012  5.6 3
#  9:   3 C 2004  4.5 3
# 10:   3 A 2012 22.0 4
# 11:   3 B 2012  4.8 5
# 12:   3 C 2012  5.6 6

Pay specific attention to the last three rows in comparison with @akrun's approach.

The equivalent in base R for that last "data.table" approach would be something like:

x <- do.call(rbind, lapply(c("df1", "df2", "df3"), function(x) {
  setNames(cbind(rn = x, get(x)), c("id", paste0("V", 1:ncol(get(x)))))
}))
x[order(ave(as.numeric(x$id), x$id, FUN = seq_along)), ]

_{(So the moral is, use "data.table".)}

h3rm4n · Answer 3 · 2016-02-12T08:14:27.087

You can also do this in base R with rbind without the need to use extra packages, but you will have to set the columnnames of df2 the same as the columnnames in df1:

colnames(df2) <- colnames(df1) # or: setNames(df2, colnames(df1))
new.df <- rbind(df1,df2)
new.df <- new.df[order(new.df$k),]

This will result in the following dataframe:

> new.df
   k    l    g
1  A 2004 12.0
11 A 2012 22.0
2  B 2004  3.4
21 B 2012  4.8
3  C 2004  4.5
31 C 2012  5.6

score 0 · Answer 4 · answered Jan 20 '23 at 18:06

Solution using dplyr without the need for a key column and without modifying the original order (if the data frames weren't sorted by an "id" column).

library(dplyr)

stopifnot((N <- nrow(DATA)) == nrow(DATA2))

INTER_DATA <- setNames(DATA2, names(DATA)) %>%
    bind_rows(DATA, .) %>%
    arrange(rep(seq_len(N), length = n()))

Result:

  k    l    g
1 A 2004 12.0
2 A 2012 22.0
3 B 2004  3.4
4 B 2012  4.8
5 C 2004  4.5
6 C 2012  5.6

Merge two data frames to get alternate rows of each data frame in sequence

4 Answers4

Linked