how to replace a dataframe with another dataframe in R

Question

i want to replace a df1 data, with df2, which df2 is a data like df1 example

df1 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. MAHJUM-61365. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "A. RIZAL. AD-11002795. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KADIR-60447ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY",
    "ABDUL AZIZ @HYUNDAI",
    "ABDUL AZIZ@HYUNDAI"
  ))

and df2 is

df2 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY"
  ))

if df1 look like a df2, df1 would replaced to df2

akrun · Accepted Answer · 2023-04-04T06:11:41.180

3

As it is substring match, we can use fuzzyjoin

library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = 'name') %>% 
  transmute(name = coalesce(name.y, name.x))

or use a distance based approach

 stringdist_left_join(df1, df2, by = 'name') %>% 
   transmute(name = coalesce(name.y, name.x))

edited Apr 04 '23 at 06:11

answered Apr 04 '23 at 06:06

akrun

874,273
37
540
662

i'm sorry, but i've tried with my dataset which have 181979 obs, i have 1013 obs in df2, but when i run this code, its make my dataset to 185136, its more observation than my dataset, why can this happen? – Fadhil Dzikri Apr 04 '23 at 23:29
@FadhilDzikri it happens when it is matching to more than one place because these are not exact matches - the first one does substring match and the second one does a distance match. Suppose, your entire dataset have multiple places where the substrings are matching, it does a cartesian join – akrun Apr 05 '23 at 02:15
so u have other idea for solution on this? – Fadhil Dzikri Apr 05 '23 at 03:28
@FadhilDzikri with substring matches, you can only change the method = in `stringdist_left_join` – akrun Apr 05 '23 at 04:02

GKi · Answer 2 · 2023-04-04T08:51:21.483

0

You can use adist to find the best match and replace them.

i <- max.col(-adist(df1$name, df2$name, partial=TRUE))
df1$name <- df2$name[i]

df1
#                      name
#1          A. MAHJUM-61365
#2          A. MAHJUM-61365
#3    A. RIZAL. AD-11002795
#4    A. RIZAL. AD-11002795
#5         ABD. KADIR-60447
#6         ABD. KADIR-60447
#7         ABD. KAHAR-62551
#8  ABD. RASYID DS-11002082
#9      ABDREAS APUNG @SANY
#10     ABDUL AZIS @HYUNDAY
#11     ABDUL AZIS @HYUNDAY
#12     ABDUL AZIS @HYUNDAY

edited Apr 04 '23 at 08:51

answered Apr 04 '23 at 07:36

GKi

37,245
2
26
48

1

but it doesnt work in #10 11 12 ? – Fadhil Dzikri Apr 04 '23 at 07:52
What should stand in #10 11 12? `ABDUL AZIS @HYUNDAY` is the closest match. – GKi Apr 05 '23 at 06:35

how to replace a dataframe with another dataframe in R

2 Answers2