count the transpositions needed to a string so that it can be found in another string

Question

Here is what I am trying to do: When the term I am analyzing is "apples", I would like to know how many transpositions are needed to "apples" so that it can be found in a string.

"buy apples now" => 0 transposition needed (apples is present).

"cheap aples online" => 1 transposition is needed (apples to aples).

"find your ap ple here" => 2 transpositions are needed (apples to ap ple).

"aple" => 2 transpositions are needed (apples to aple).

"bananas" => 5 transpositions are needed (apples to bananas).

the stringdist and the adist functions don't work because they tell me how many transpositions are needed to transform one string into the other. Anyway, here is what I wrote so far:

#build matrix
a <- c(rep("apples",5),rep("bananas",3))
b <- c("buy apples now","cheap aples online","find your ap ple here","aple","bananas","cherry and bananas","pumpkin","banana split")
d<- data.frame(a,b)
colnames(d)<-c("term","string")

#count transpositions needed
d$transpositions <- mapply(adist,d$term,d$string)
print(d)

ok thank you, shall I add it to the title too or is the tag enough? — Julien Massardier, Apr 03 '15 at 17:34
I edited your code (in my answer) to be apples in `a <- c(rep("apples",5),rep("bananas",3))` — infominer, Apr 03 '15 at 18:13
Ouch, thanks infominer, let me correct it in the question too! — Julien Massardier, Apr 03 '15 at 21:02

score 0 · Answer 1 · answered Apr 03 '15 at 18:12

0

you need to check for apples first and then do the transpositions

a <- c(rep("apples",5),rep("bananas",3))
b <- c("buy apples now","cheap aples online","find your ap ple here","aple","bananas","cherry and bananas","pumpkin","banana split")
d<- data.frame(a,b, stringsAsFactors = F)
colnames(d)<-c("term","string")

#check for apples first
d$apples <-grepl("apples", d$string)

#count transpositions needed
d$transpositions <- ifelse(d$apples ==FALSE, mapply(adist,d$term,d$string), 0)
print(d)

answered Apr 03 '15 at 18:12

infominer

1,981
13
17

hmm I just reread your question, will have to rethink my answer. I will post it when I deal with it later. How do you want to deal with sentences as opposed to one word transformations? – infominer Apr 03 '15 at 18:33
Tanks @infominer! much appreciated :) grepl is useful. The 1st step is, indeed, detecting the presence of the term spelled properly in the string. If the term spelled properly is not found, then I need to isolate the piece of the string that is the most similar to my term, and finally calculate the similarity between this piece of string and the term. Regarding sentences as opposed to "one word", I want to avoid that "buy aple now" gets a worse score than "aple" because of the extra words "buy and now". What matters is how similar the section "aple" of "buy aple now" is to the term "apple". – Julien Massardier Apr 03 '15 at 22:27

score 0 · Answer 2 · answered Apr 04 '15 at 03:00

So, here is the dirty solution I came up with so far:

#create a data.frame
a <- c(rep("apples",5),rep("banana split",3))
b <- c("buy apples now","cheap aples online","find your ap ple here","aple","bananas","cherry and bananas","pumpkin","banana split")
d <- data.frame(a,b)
colnames(d) <- c("term","string")

#split the string into sequences of consecutive characters whose length is equal to the length of the term on the same row. Calculate the similarity to the term of each sequence of characters and identify the most relevant piece of string for each row.

mostrelevantpiece <- NULL

for (j in 1:length(d$string)){
  pieces<-NULL
  piecesdist<-NULL
  for (i in 1:max((nchar(as.character(d$string[j]))-nchar(as.character(d$term[j])))+1,1)){
    addpiece <- substr(d$string[j],i,i+nchar(as.character(d$term[j]))-1)
    dist <- adist(addpiece,d$term[j])
    pieces[i] <- str_trim(addpiece)
    piecesdist[i] <- dist
    mostrelevantpiece[j] <- pieces[which.min(piecesdist)]
  }
}

#calculate the number of transpositions needed to transform the "most relevant piece of string" into the term.

d$transpositionsneeded <- mapply(adist,mostrelevantpiece,d$term)

count the transpositions needed to a string so that it can be found in another string

2 Answers2