1

I have a dataframe 'df1' that looks like:

Number Variable1 Variable Variable3
1 A B C
2 A B C
3 A B C
4 A B C
5 A B C

And I have a second dataframe 'df2' that looks like:

Number Variable1 Variable Variable3
1 D E F
2 G H I
3 J K L
4 M N O
15 P Q R

I want to update the three Variable columns in df1 with the data in the Variable columns in df2 based on matching values in Number so that df1 ends up looking like:

Number Variable1 Variable Variable3
1 D E F
2 G H I
3 J K L
4 M N O
5 A B C
Jacob
  • 329
  • 2
  • 10

4 Answers4

3

You could use a power_left_join from powerjoin package with conflict = coalesce_yx like this:

library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce_yx)
#>   Number Variable1 Variable Variable3
#> 1      1         D        E         F
#> 2      2         G        H         I
#> 3      3         J        K         L
#> 4      4         M        N         O
#> 5      5         A        B         C

Created on 2022-12-13 with reprex v2.0.2


Data:

df1 <- read.table(text = 'Number    Variable1   Variable    Variable3
1   A   B   C
2   A   B   C
3   A   B   C
4   A   B   C
5   A   B   C
', header = TRUE)

df2 <- read.table(text = 'Number    Variable1   Variable    Variable3
1   D   E   F
2   G   H   I
3   J   K   L
4   M   N   O
15  P   Q   R
', header = TRUE)
Quinten
  • 35,235
  • 5
  • 20
  • 53
  • 1
    This solution works great, although maybe not as straightforward as the dplyr solution. You were first, so I'll mark this as the accepted solution. – Jacob Dec 13 '22 at 17:13
2

Would be helpful if dput(df) done. Have created another dataset for replication

df1<-cbind.data.frame(id=c(1:5),var1=rep("A",5),var2=rep("B",5),var3=rep("C",5))

df2<-cbind.data.frame(id=c(1:4,15),var1=LETTERS[7:11],var2=LETTERS[12:16],var3=LETTERS[16:20])


df1 %>%
  left_join(df2, by = "id") %>%
  mutate(var1 = coalesce(var1.y, var1.x),
         var2 = coalesce(var2.y, var2.x),
         var3 = coalesce(var3.y, var3.x)) %>%
  select(-var1.y, -var1.x,
         -var2.y, -var2.x,
         -var3.y, -var3.x)

chris jude
  • 467
  • 3
  • 8
  • This solution probably works, but there are different column names than the question, so not quite what was asked. Thank you for your efforts though. – Jacob Dec 13 '22 at 17:15
2

With dplyr, we can use rows_update

library(dplyr)
rows_update(df1, df2, by = 'Number', unmatched = "ignore")

-output

  Number Variable1 Variable Variable3
1      1         D        E         F
2      2         G        H         I
3      3         J        K         L
4      4         M        N         O
5      5         A        B         C
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Also a great solution and very simple. In my mind, I was thinking there had to be a dplyr solution, but I kept trying the wrong functions. Thank you for introducing me to rows_update. – Jacob Dec 13 '22 at 17:14
0

You could update df1 while joining using data.table package and fcoalesce function:

library(data.table)

cols = c("Variable1", "Variable", "Variable3")

setDT(df1)[df2, (cols) := Map(fcoalesce, mget(paste0("i.", cols)), mget(cols)), on="Number"]
    

   Number Variable1 Variable Variable3
    <int>    <char>   <char>    <char>
1:      1         D        E         F
2:      2         G        H         I
3:      3         J        K         L
4:      4         M        N         O
5:      5         A        B         C