R, Union two dataframe with different column names and no match

Question

I want to combine two distinct dataframes(completely different columns) in R, into one inclusive data frame.

Lets say data frame "df_a" has columns A & B:

df_a <- read.table(header=TRUE, text='
    A   B
    1   x1
    2   y1
    3   z1
')

And dataframe "df_b" has columns C & D.

df_b <- read.table(header=TRUE, text='
          C     D
          1     6.7
          1     4.5
          1     3.7
          2     3.3
          2     4.1
          2     5.2
')

Therefore the resultant dataframe "df_c" will have columns A,B,C,D, see below:

df_c
  A     B       C   D
1 1     x1      1   6.7
2 2     y1      1   4.5
3 3     z1      1   3.7
4 NA    NA      2   3.3
5 NA    NA      2   4.1
6 NA    NA      2   5.2

Approach #1:

I first tried using rbind() but that function requires matching column names, however that is not what I'm looking for.

Approach #2:

I used df_c <- merge(df_a,df_b), however merge seems to be doing a Cartesian product, see below:

df_c <-  merge(df_a,df_b)
df_c
   A  B C   D
1  1 x1 1 6.7
2  2 y1 1 6.7
3  3 z1 1 6.7
4  1 x1 1 4.5
5  2 y1 1 4.5
6  3 z1 1 4.5
7  1 x1 1 3.7
8  2 y1 1 3.7
9  3 z1 1 3.7
10 1 x1 2 3.3
11 2 y1 2 3.3
12 3 z1 2 3.3
13 1 x1 2 4.1
14 2 y1 2 4.1
15 3 z1 2 4.1
16 1 x1 2 5.2
17 2 y1 2 5.2
18 3 z1 2 5.2

Approach #3:

Than I used df_c <- union(df_a,df_b), but the result is no longer a data frame. Its turned into a list of lists, see below:

[[1]]
[1] 1 2 3

[[2]]
[1] x1 y1 z1
Levels: x1 y1 z1

[[3]]
[1] 1 1 1 2 2 2

[[4]]
[1] 6.7 4.5 3.7 3.3 4.1 5.2

Approach #4

I created my own function called unionNoMatch(), that attempts to append columns from df_2 to df_1 input paramters:

unionNoMatch <- function(df_1, df_2)
{
df_3 <- df_1; 
for (name in names(df_2))
{
cbind(df_2$name,df_3) 
}

return (df_3);

}

df_c <- unionNoMatch (df_a,df_b)

However, I got this error:

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 3

How can I achieve my task of combining 2 data frames with non matching columns into a single data frame?

Thanks

So you literally have no way of matching the rows in `df_a` to those of `df_b` even though the number of rows differs? — MichaelChirico, Aug 18 '15 at 01:10
This is yet another cbind.fill question, one of which was asked earlier today and closed as a duplicate of one of the many other versions of this question. Just google cbind.fill — joran, Aug 18 '15 at 01:13

score 7 · Answer 1 · answered Aug 18 '15 at 01:15

7

R treats variables on the same row as related, so it doesn't want to put things on the same row unless it is told you want them there. In general, this is to prevent mistakes. If you know what you are doing, you can manually give an id to merge by:

df_a$row <- 1:nrow(df_a)
df_b$row <- 1:nrow(df_b)
merge(df_a, df_b, by = "row", all = TRUE)

answered Aug 18 '15 at 01:15

jeremycg

24,657
5
63
74

3

Or just use `"row.names"` in the merge call - `merge(df_a,df_b,by="row.names",all=TRUE)` – thelatemail Aug 18 '15 at 01:45
I chose this answer because it used the base package and it worked, Thanks – cyber101 Aug 18 '15 at 14:45

score 5 · Accepted Answer · answered Aug 18 '15 at 01:12

Seems like you're trying to do something that's probably not recommended, but here's what I'd do in data.table:

library(data.table) #1.9.5+ to get the on argument to [.data.table
setDT(df_a,keep.rownames=T); setDT(df_b,keep.rownames=T)
> df_a[df_b,on="rn"]
   rn  A  B C   D
1:  1  1 x1 1 6.7
2:  2  2 y1 1 4.5
3:  3  3 z1 1 3.7
4:  4 NA NA 2 3.3
5:  5 NA NA 2 4.1
6:  6 NA NA 2 5.2

(basically, we find something to merge on, namely the row number, then merge on that)

score 2 · Answer 3 · answered Aug 18 '15 at 02:38

We can use add_rownames from dplyr to create a row name column in each of the datasets and then do full_join/left_join

library(dplyr)
full_join(add_rownames(df_a) , add_rownames(df_b), by='rowname') %>% 
                    select(-rowname)
#   A    B C   D
#1  1   x1 1 6.7
#2  2   y1 1 4.5
#3  3   z1 1 3.7
#4 NA <NA> 2 3.3
#5 NA <NA> 2 4.1
#6 NA <NA> 2 5.2

R, Union two dataframe with different column names and no match

3 Answers3

Linked