I have a dataframe with a column with names of countries. Those names are written different even when they are the same country for example, there are differences in lower-upper cases, some letters missing, some extra letters and son on.
So I need to group them within similar patterns. For example, I have two observations that belongs to the same category: ("Brasil","brazil") that I need to put together. I cannot do this by hand because the entire dataframe is composed of ~10 000 observations.
After making those observations that are similar in one category, I need to make some subsets from this categories.
Is there a possible solution for grouping those similar names in a category and then make subsets with this categories with the other columns from the dataframe?
I was trying to use agrep function with no succes.
number <- c(1:6)
country <- c("Brasil","brazil","Costa Rica","costarrica","suiza","Holanda")
example <- data.frame(number,country)
agrupamiento <- for (i in 1:nrow(example)){
agrep(example$country[i], example$country,
max.distance = 0.1,ignore.case = TRUE)
}