1

I have a large dataframe where names (col1) should be nested within location name (col2). In the example below "d" from col1 should be within "z" of col2 but is listed as "y" in the eighth element. I cannot just rename "y" in col2 as it is correct in most places. I need to rename "y" to "z" only when col1 == "d". It is a large dataframe with multiple example so it is also not possible just to rename the element

col1<-c("a","b","c","d","a","b","c","d")
col2<-c("x","y","z","z","x","y","z","y")
df<-data.frame(col1,col2)
AdSad
  • 77
  • 4
  • how do you know that the correct mapping is `d->z` instead of `d->y` since both show up in your data but you are prioritizing the first one – rawr Nov 12 '21 at 04:35
  • The above is just an example. From knowledge about the data it is a clear error. Like mapping Texas to China instead of the USA – AdSad Nov 12 '21 at 05:38

1 Answers1

2

This would be easy if you can create a data frame lookup with the correct combination of col1 and col2. You can then just left_join your original data frame with the lookup.

library(dplyr)

# Create a lookup table. In reality you probably need to create this with other methods.
lookup <- df %>%
  distinct() %>%
  filter(!(col1 %in% "d" & col2 %in% "y"))

# join col1 to the lookup
df2 <- df %>%
  select(-col2) %>%
  left_join(lookup, by = "col1")
df2
#   col1 col2
# 1    a    x
# 2    b    y
# 3    c    z
# 4    d    z
# 5    a    x
# 6    b    y
# 7    c    z
# 8    d    z
www
  • 38,575
  • 12
  • 48
  • 84
  • Great thanks. I didn't mention there are multiple other columns with unique values so I had to define the columns in distinct(col1,col2) to prevent a lookup the same length as my data set – AdSad Nov 12 '21 at 05:28