Join complementary dataframes, no NAs where a value is available

Question

I have two data frames with a unifying index v1. I need to create a third data frame with no NA values, if a number is available.

I have tried various combinations of the join functions from dplyr and the rbind.fill function in plyr.

# Given
v1 <- c("a", "b", "c", "d")
df1 <- cbind.data.frame(v1, v2 = c(1,NA,3,NA))
df2 <- cbind.data.frame(v1, v2 = c(NA,2,NA,4))

# I would like
df3 <- cbind.data.frame(v1, v2 = c(1,2,3,4))

How can I make this possible?

Or `setDT(df1)[is.na(v2), v2 := setDT(df2)[.SD, v2, on = .(v1)]]` as described here: [Replace missing values (NA) in one data set with values from another where columns match](https://stackoverflow.com/questions/32638845/replace-missing-values-na-in-one-data-set-with-values-from-another-where-colum) — Henrik, May 13 '19 at 19:49

score 3 · Answer 1 · answered May 13 '19 at 19:10

3

An option would be to do a full_join on 'v1' and then coalesce the 'v2' columns

library(dplyr)
full_join(df1, df2, by = 'v1') %>%
    transmute(v1, v2 = coalesce(v2.x, v2.y))

answered May 13 '19 at 19:10

akrun

874,273
37
540
662

2

@Gainz ```setDT(df1)[df2, v2i := i.v2, on='v1'][is.na(v2), v2 := v2i][,v2i:= NULL][]``` I am pretty sure that I got this from akrun; I had this comment beside this line within one of my scripts saying credit goes to them. – M-- May 13 '19 at 19:36

score 2 · Answer 2 · answered May 13 '19 at 19:16

Here's a simple base solution:

> df3 = df2
> df3$v2 = ifelse(is.na(df1$v2),df2$v2,df1$v2)
> df3
  v1 v2
1  a  1
2  b  2
3  c  3
4  d  4

the ifelse picks the number from one column or the other based on the NA nature of one column.

It doesn't cover the case if both or neither of the source v2 columns are NA but there's none in your sample and you don't mention it...

M-- · Answer 3 · 2019-05-19T21:09:31.397

1

I am sure that I got this from @akrun but could not find the thread. Posting an answer for future reference:

library(data.table)
setDT(df1)[df2, v2i := i.v2, on='v1'][is.na(v2), v2 := v2i][,v2i:= NULL][]

#>    v1 v2
#> 1:  a  1
#> 2:  b  2
#> 3:  c  3
#> 4:  d  4

^{Created on 2019-05-13 by the reprex package (v0.2.1)}

@Henrik's comment also suggested the following:

setDT(df1)[is.na(v2), v2 := setDT(df2)[.SD, v2, on = .(v1)]]

but I personally prefer the first solution over this mostly because of the sequence of getting the answer (joining dataframes first and replacing NA later).

edited May 19 '19 at 21:09

answered May 13 '19 at 20:20

M--

25,431
8
61
93

1

You find some arguments for the "update join" idiom [here](https://stackoverflow.com/a/54313203/1851712) – Henrik May 13 '19 at 20:39

moodymudskipper · Answer 4 · 2022-01-17T14:37:05.690

0

We can use {powerjoin} :

library(powerjoin)
power_left_join(df1, df2, by = "v1", conflict = coalesce_xy)
#   v1 v2
# 1  a  1
# 2  b  2
# 3  c  3
# 4  d  4

edited Jan 17 '22 at 14:37

answered May 14 '19 at 08:16

moodymudskipper

46,417
11
121
167

Join complementary dataframes, no NAs where a value is available

4 Answers4