R Replace column values in dataframe base on matching indexing column in separate dataframe

Question

I have a dataframe 'df1' that looks like:

Number	Variable1	Variable	Variable3
1	A	B	C
2	A	B	C
3	A	B	C
4	A	B	C
5	A	B	C

And I have a second dataframe 'df2' that looks like:

Number	Variable1	Variable	Variable3
1	D	E	F
2	G	H	I
3	J	K	L
4	M	N	O
15	P	Q	R

I want to update the three Variable columns in df1 with the data in the Variable columns in df2 based on matching values in Number so that df1 ends up looking like:

Number	Variable1	Variable	Variable3
1	D	E	F
2	G	H	I
3	J	K	L
4	M	N	O
5	A	B	C

score 3 · Accepted Answer · answered Dec 13 '22 at 17:01

You could use a power_left_join from powerjoin package with conflict = coalesce_yx like this:

library(powerjoin)
power_left_join(df1, df2, by = "Number", conflict = coalesce_yx)
#>   Number Variable1 Variable Variable3
#> 1      1         D        E         F
#> 2      2         G        H         I
#> 3      3         J        K         L
#> 4      4         M        N         O
#> 5      5         A        B         C

^{Created on 2022-12-13 with reprex v2.0.2}

Data:

df1 <- read.table(text = 'Number    Variable1   Variable    Variable3
1   A   B   C
2   A   B   C
3   A   B   C
4   A   B   C
5   A   B   C
', header = TRUE)

df2 <- read.table(text = 'Number    Variable1   Variable    Variable3
1   D   E   F
2   G   H   I
3   J   K   L
4   M   N   O
15  P   Q   R
', header = TRUE)

This solution works great, although maybe not as straightforward as the dplyr solution. You were first, so I'll mark this as the accepted solution. — Jacob, Dec 13 '22 at 17:13

score 2 · Answer 2 · answered Dec 13 '22 at 17:01

Would be helpful if dput(df) done. Have created another dataset for replication

df1<-cbind.data.frame(id=c(1:5),var1=rep("A",5),var2=rep("B",5),var3=rep("C",5))

df2<-cbind.data.frame(id=c(1:4,15),var1=LETTERS[7:11],var2=LETTERS[12:16],var3=LETTERS[16:20])


df1 %>%
  left_join(df2, by = "id") %>%
  mutate(var1 = coalesce(var1.y, var1.x),
         var2 = coalesce(var2.y, var2.x),
         var3 = coalesce(var3.y, var3.x)) %>%
  select(-var1.y, -var1.x,
         -var2.y, -var2.x,
         -var3.y, -var3.x)

This solution probably works, but there are different column names than the question, so not quite what was asked. Thank you for your efforts though. — Jacob, Dec 13 '22 at 17:15

score 2 · Answer 3 · answered Dec 13 '22 at 17:07

2

With dplyr, we can use rows_update

library(dplyr)
rows_update(df1, df2, by = 'Number', unmatched = "ignore")

-output

  Number Variable1 Variable Variable3
1      1         D        E         F
2      2         G        H         I
3      3         J        K         L
4      4         M        N         O
5      5         A        B         C

answered Dec 13 '22 at 17:07

akrun

874,273
37
540
662

1

Also a great solution and very simple. In my mind, I was thinking there had to be a dplyr solution, but I kept trying the wrong functions. Thank you for introducing me to rows_update. – Jacob Dec 13 '22 at 17:14

score 0 · Answer 4 · answered Dec 13 '22 at 17:43

You could update df1 while joining using data.table package and fcoalesce function:

library(data.table)

cols = c("Variable1", "Variable", "Variable3")

setDT(df1)[df2, (cols) := Map(fcoalesce, mget(paste0("i.", cols)), mget(cols)), on="Number"]
    

   Number Variable1 Variable Variable3
    <int>    <char>   <char>    <char>
1:      1         D        E         F
2:      2         G        H         I
3:      3         J        K         L
4:      4         M        N         O
5:      5         A        B         C

R Replace column values in dataframe base on matching indexing column in separate dataframe

4 Answers4