3

I want to combine two distinct dataframes(completely different columns) in R, into one inclusive data frame.

Lets say data frame "df_a" has columns A & B:

df_a <- read.table(header=TRUE, text='
    A   B
    1   x1
    2   y1
    3   z1
')

And dataframe "df_b" has columns C & D.

df_b <- read.table(header=TRUE, text='
          C     D
          1     6.7
          1     4.5
          1     3.7
          2     3.3
          2     4.1
          2     5.2
')

Therefore the resultant dataframe "df_c" will have columns A,B,C,D, see below:

df_c
  A     B       C   D
1 1     x1      1   6.7
2 2     y1      1   4.5
3 3     z1      1   3.7
4 NA    NA      2   3.3
5 NA    NA      2   4.1
6 NA    NA      2   5.2

Approach #1:

I first tried using rbind() but that function requires matching column names, however that is not what I'm looking for.

Approach #2:

I used df_c <- merge(df_a,df_b), however merge seems to be doing a Cartesian product, see below:

df_c <-  merge(df_a,df_b)
df_c
   A  B C   D
1  1 x1 1 6.7
2  2 y1 1 6.7
3  3 z1 1 6.7
4  1 x1 1 4.5
5  2 y1 1 4.5
6  3 z1 1 4.5
7  1 x1 1 3.7
8  2 y1 1 3.7
9  3 z1 1 3.7
10 1 x1 2 3.3
11 2 y1 2 3.3
12 3 z1 2 3.3
13 1 x1 2 4.1
14 2 y1 2 4.1
15 3 z1 2 4.1
16 1 x1 2 5.2
17 2 y1 2 5.2
18 3 z1 2 5.2

Approach #3:

Than I used df_c <- union(df_a,df_b), but the result is no longer a data frame. Its turned into a list of lists, see below:

[[1]]
[1] 1 2 3

[[2]]
[1] x1 y1 z1
Levels: x1 y1 z1

[[3]]
[1] 1 1 1 2 2 2

[[4]]
[1] 6.7 4.5 3.7 3.3 4.1 5.2

Approach #4

I created my own function called unionNoMatch(), that attempts to append columns from df_2 to df_1 input paramters:

unionNoMatch <- function(df_1, df_2)
{
df_3 <- df_1; 
for (name in names(df_2))
{
cbind(df_2$name,df_3) 
}

return (df_3);

}

df_c <- unionNoMatch (df_a,df_b)

However, I got this error:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 3

How can I achieve my task of combining 2 data frames with non matching columns into a single data frame?

Thanks

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
cyber101
  • 2,822
  • 14
  • 50
  • 93
  • So you literally have no way of matching the rows in `df_a` to those of `df_b` even though the number of rows differs? – MichaelChirico Aug 18 '15 at 01:10
  • 2
    This is yet another cbind.fill question, one of which was asked earlier today and closed as a duplicate of one of the many other versions of this question. Just google cbind.fill – joran Aug 18 '15 at 01:13

3 Answers3

7

R treats variables on the same row as related, so it doesn't want to put things on the same row unless it is told you want them there. In general, this is to prevent mistakes. If you know what you are doing, you can manually give an id to merge by:

df_a$row <- 1:nrow(df_a)
df_b$row <- 1:nrow(df_b)
merge(df_a, df_b, by = "row", all = TRUE)
jeremycg
  • 24,657
  • 5
  • 63
  • 74
5

Seems like you're trying to do something that's probably not recommended, but here's what I'd do in data.table:

library(data.table) #1.9.5+ to get the on argument to [.data.table
setDT(df_a,keep.rownames=T); setDT(df_b,keep.rownames=T)
> df_a[df_b,on="rn"]
   rn  A  B C   D
1:  1  1 x1 1 6.7
2:  2  2 y1 1 4.5
3:  3  3 z1 1 3.7
4:  4 NA NA 2 3.3
5:  5 NA NA 2 4.1
6:  6 NA NA 2 5.2

(basically, we find something to merge on, namely the row number, then merge on that)

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
2

We can use add_rownames from dplyr to create a row name column in each of the datasets and then do full_join/left_join

library(dplyr)
full_join(add_rownames(df_a) , add_rownames(df_b), by='rowname') %>% 
                    select(-rowname)
#   A    B C   D
#1  1   x1 1 6.7
#2  2   y1 1 4.5
#3  3   z1 1 3.7
#4 NA <NA> 2 3.3
#5 NA <NA> 2 4.1
#6 NA <NA> 2 5.2
akrun
  • 874,273
  • 37
  • 540
  • 662