I am doing some fuzzy text matching to match school names. Here is an example of my data, which is two columns in a tibble:
data <- tibble(school1 = c("abilene christian", "abilene christian", "abilene christian", "abilene christian"),
school2 = c("a t still university of health sciences", "abilene christian university", "abraham baldwin agricultural college", "academy for five element acupuncture"))
data
# A tibble: 4 x 2
school1 school2
<chr> <chr>
1 abilene christian a t still university of health sciences
2 abilene christian abilene christian university
3 abilene christian abraham baldwin agricultural college
4 abilene christian academy for five element acupuncture
What I would like to do is use stringdist
to run through all the available methods
and return a table that looks like this, where my original text remains in addition to a column for each method and the value returned:
# A tibble: 4 x 12
school1 school2 osa lv dl hamming lcs qgram cosine jaccard jw soundex
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 abilene christian a t still … 29.0 29.0 29.0 Inf 36.0 24.0 0.189 0.353 0.442 1.00
2 abilene christian abilene ch… 11.0 11.0 11.0 Inf 11.0 11.0 0.0456 0.200 0.131 0
3 abilene christian abraham ba… 28.0 28.0 28.0 Inf 35.0 25.0 0.274 0.389 0.431 1.00
4 abilene christian academy fo… 28.0 28.0 28.0 Inf 37.0 29.0 0.333 0.550 0.445 1.00
I can get this to work using a for loop using the following:
method_list <- c("osa", "lv", "dl", "hamming", "lcs", "qgram", "cosine", "jaccard", "jw", "soundex")
for (i in method_list) {
data[, i] <- stringdist(data$school1, data$school2, method = i)
}
What I would like to do it convert this into the more readable dplyr syntax, but I can't get the loop to work with mutate. Here is what I have:
for (i in method_list) {
ft_result <- data %>%
mutate(i = stringdist(school1, school2, method = i))
}
Running this returns 1 additional column added to my original data called "i" with a value of 1 for every row.
Question 1: Is a for-loop the best way to accomplish what I am trying to get to? I looked at purrr to see if I could use something like map or invoke, but I don't think any of those functions do what I want.
Question 2: If a for-loop is the way to go, how can I make it work with mutate? I tried using mutate_at, but that didn't work either.