-1

In the following code below, I want to compute similarity between two columns of text strings.To achieve this, I take first 10 rows of "Petal.Length" column from iris and assign it to a1 , and first 4 rows from "Sepal.Length" column from iris and assign it to a2. My objective is that each "a2" value should be compared to every a1 value using the formula in the last line such that I get a final vector percent_calc with 40 values.

library(stringdist)
library(RecordLinkage)

a1 = iris$Petal.Length[1:10] * 1000
a2 = iris$Sepal.Length[1:4]  * 1000
a1 = as.character(a1)
a2 = as.character(a2)

percent_calc = RecordLinkage::levenshteinSim(a2,a1)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Ashmin Kaul
  • 860
  • 2
  • 12
  • 37

1 Answers1

0

Get all combinations, then get distance:

a12 <- expand.grid(a1, a2, stringsAsFactors = FALSE)

percent_calc <- levenshteinSim(a12$Var1, a12$Var2)

percent_calc
# [1] 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
# [19] 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.75 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50
# [37] 0.50 0.50 0.50 0.50
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thanks for the help and reply, however I have got the solution, however I need help in the second part above, please help me here. – Ashmin Kaul Dec 07 '17 at 10:34