I'd like for the adist function to work the same way it does for words as it does for characters. What I mean by this is I'd like a deletion/substitution/insertion to apply to a whole word instead of characters. For example, I want "Alert 12 went off at 3am" and "Alert 17 was heard at 3am" to have a Levenshtein Distance of 3 because there are three substitutions of words needed to get from one string to another. Thanks
Asked
Active
Viewed 235 times
0
-
So you want to count the different words? `strsplit` would get you most of the way there. – cory Jan 03 '20 at 12:52
-
1read this [discussion](https://stackoverflow.com/questions/5055839/word-level-edit-distance-of-a-sentence) – phiver Jan 03 '20 at 12:53
1 Answers
0
I guess you can try the following code to count different words
library(vecsets)
d <- length(vsetdiff(unlist(strsplit(s1," ")),unlist(strsplit(s2," "))))
such that
> d
[1] 3
DATa
s1 <- "Alert 12 went off at 3am"
s2 <- "Alert 17 was heard at 3am"

ThomasIsCoding
- 96,636
- 9
- 24
- 81