How to recode values based on duplicate values in another dataset

Question

I am working with the following data. We can call it x

   New_Name_List               Old_Name_List
1     bumiputera        bumiputera (muslims)
2     bumiputera          bumiputera (other)
3 non bumiputera non bumiputera (indigenous)
4        chinese                     chinese

The goal is to recode data in another data object that looks like this. We can call it y

  EPR_country_code EPR_country           EPR_group_lower_2
1              835      Brunei        bumiputera (muslims)
2              835      Brunei          bumiputera (other)
3              835      Brunei non bumiputera (indigenous)
4              835      Brunei                     chinese

If x$New_Name_List has duplicate values I want the x$Old_Name_List values in the new column y$EPR_group_lower_3.

If x$New_Name_List has unique values, I want the x$New_Name_List in the new column, y$EPR_group_lower_3.

So that the data will look like this at the end:

  EPR_country_code EPR_country           EPR_group_lower_2  EPR_group_lower_3
1              835      Brunei        bumiputera (muslims)  bumiputera (muslims)
2              835      Brunei          bumiputera (other)  bumiputera (other)
3              835      Brunei non bumiputera (indigenous)  non bumiputera
4              835      Brunei                     chinese  chinese

Thank you so much

Ronak Shah · Accepted Answer · 2019-11-25T02:42:39.217

We could use ifelse and select values from Old_Name_List or New_Name_List based on duplicate values in New_Name_List.

y$EPR_group_lower_3 <- with(x, ifelse(duplicated(New_Name_List) | 
        duplicated(New_Name_List, fromLast = TRUE), Old_Name_List, New_Name_List))
y

#  EPR_country_code EPR_country           EPR_group_lower_2    EPR_group_lower_3
#1              835      Brunei        bumiputera (muslims) bumiputera (muslims)
#2              835      Brunei          bumiputera (other)   bumiputera (other)
#3              835      Brunei non bumiputera (indigenous)       non bumiputera
#4              835      Brunei                     chinese              chinese

Or find the indices where the values are duplicated and replace only those.

y$EPR_group_lower_3 <- x$New_Name_List
inds <- with(x, duplicated(New_Name_List) | duplicated(New_Name_List, fromLast = TRUE))
y$EPR_group_lower_3[inds] <- x$Old_Name_List[inds]

data

x <- structure(list(New_Name_List = c("bumiputera", "bumiputera", 
"non bumiputera", "chinese"), Old_Name_List = c("bumiputera (muslims)", 
"bumiputera (other)", "non bumiputera (indigenous)", "chinese"
)), class = "data.frame", row.names = c(NA, -4L))

y <- structure(list(EPR_country_code = c(835L, 835L, 835L, 835L), 
EPR_country = c("Brunei", "Brunei", "Brunei", "Brunei"), 
EPR_group_lower_2 = c("bumiputera (muslims)", "bumiputera (other)", 
"non bumiputera (indigenous)", "chinese")), class = "data.frame", 
row.names = c(NA, -4L))

How to recode values based on duplicate values in another dataset

1 Answers1