I'm looking to standardise a set of manually inputted strings, so that:
index fruit
1 Apple Pie
2 Apple Pie.
3 Apple. Pie
4 Apple Pie
5 Pear
should look like:
index fruit
1 Apple Pie
2 Apple Pie
3 Apple Pie
4 Apple Pie
5 Pear
For my use case, grouping them by phonetic sound is fine, but I'm missing the piece on how to replace the least common strings with the most common ones.
library(tidyverse)
library(stringdist)
index <- seq(1,5,1)
fruit <- c("Apple Pie", "Apple Pie.", "Apple. Pie", "Apple Pie", "Pear")
df <- data.frame(index, fruit) %>%
mutate(grouping = phonetic(fruit)) %>%
add_count(fruit) %>%
# Missing Code
select(index, fruit)