I am new in R and coding world, pardon if i perhaps mispelled some or more jargon here (cmiiw).
I am facing a challenge to clean city name in a dataframe.
Tried to use GetCloseMatches
, strdist_inner_join
(with fuzzywuzzy i believe) with dplyr
style but still haven't meet my needs.
1st attempt:
vec3 = unlist(world.cities$name)
str1 = c('Jakarta Utara')
GetCloseMatches(string = str1, sequence_strings = vec3, n = 1L, cutoff = 0.6)
but it can only "translate" one of city each time, do you know how to make it repeat for all of the dataframe? for loop or function?
2nd attempt:
df2 <- df[1:10,] %>%
stringdist_left_join(world.cities, by = c(cust_city = "name"), max_dist = 1)
it shows most of the city but missing the "Jakarta Utara"
I am using two database/dataframe(cmiiw) of the city to be checked with (If you see the "Look Up" table on the right side, it has hundreds of city name, not only 6), first is SHP files that i fortified, second is world.cities$name, both are doing great but somehow it only appear one city at a time. ie: if i am using SHP files, Jakarta Utara is appear but Karawang is not, vice versa.
My Goal is to replace the left word to the right word (1 to 2)
left > right
Karawang - to Karawang
Jakarta Utara to Jakarta
Jakarta to Jakarta, etc
Do you know the most efficient way to do it?
Thank you very much for your helps!
regards