Computing edit distance using two simple columns from iris dataset

Question

In the following code below, I want to compute similarity between two columns of text strings.To achieve this, I take first 10 rows of "Petal.Length" column from iris and assign it to a1 , and first 4 rows from "Sepal.Length" column from iris and assign it to a2. My objective is that each "a2" value should be compared to every a1 value using the formula in the last line such that I get a final vector percent_calc with 40 values.

library(stringdist)
library(RecordLinkage)

a1 = iris$Petal.Length[1:10] * 1000
a2 = iris$Sepal.Length[1:4]  * 1000
a1 = as.character(a1)
a2 = as.character(a2)

percent_calc = RecordLinkage::levenshteinSim(a2,a1)

`sapply(a2, function(i) RecordLinkage::levenshteinSim(i,a1))` — Sotos, Dec 07 '17 at 09:01
So convert that matrix to a vector!!!! It is not that hard to do!!! — Sotos, Dec 07 '17 at 09:18
@Sotos, Please help me with the second part I have added in the above post, thanks. — Ashmin Kaul, Dec 07 '17 at 10:32
@zx8754, I have created a new question as you suggested, please help. https://stackoverflow.com/questions/47693376/displaying-corresponding-values-in-data-frame-in-r — Ashmin Kaul, Dec 07 '17 at 10:52
@zx8754, As I am solving my problem, this was one of the issue I had been facing, now there is a new issue, this problem is resolved. — Ashmin Kaul, Dec 07 '17 at 10:55

score 0 · Answer 1 · answered Dec 07 '17 at 09:36

0

Get all combinations, then get distance:

a12 <- expand.grid(a1, a2, stringsAsFactors = FALSE)

percent_calc <- levenshteinSim(a12$Var1, a12$Var2)

percent_calc
# [1] 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
# [19] 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.75 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
# [37] 0.50 0.50 0.50 0.50

answered Dec 07 '17 at 09:36

zx8754

52,746
12
114
209

Thanks for the help and reply, however I have got the solution, however I need help in the second part above, please help me here. – Ashmin Kaul Dec 07 '17 at 10:34

Computing edit distance using two simple columns from iris dataset

1 Answers1