0

This piece of code returns the Levenshtein edit distance of 2 terms. How can i make this so that insertion and deletion only costs 0.5 instead of 1 ? substitution should still costs 1.

def substCost(x,y):
   if x == y: 
      return 0
   else: 
      return 1

def  levenshtein(target, source):
   i = len(target); j = len(source)
   if i == 0:  
      return j
   elif j == 0: 
      return i

   return(min(levenshtein(target[:i-1],source)+1,
          levenshtein(target, source[:j-1])+1,
          levenshtein(target[:i-1], source[:j-1])+substCost(source[j-1],target[i-1])))
Brutalized
  • 85
  • 2
  • 9
  • Please fix the indentation of your question, it's not possible to read currently. Even then, putting everything on one line is not idiomatic python and makes this difficult to answer. – roganjosh Sep 19 '16 at 19:39
  • What have you tried so far? You should post an attempt before asking someone to solve this for you – Nick Becker Sep 19 '16 at 19:56
  • Does replacing a vowel with another vowel also cost 0.5, or is it only inserting and deleting that's cheaper than normal? What about replacing a non-vowel with a vowel or visa versa? – Blckknght Sep 19 '16 at 19:56
  • 2
    Don't vandalize your own questions. That is not permitted. – Magisch Sep 20 '16 at 09:47

1 Answers1

1

There are two places you need to account for the reduced cost of adding or removing a vowel. They are the return j and return i lines in the base cases of your function, and the +1s in the min call after the first two recursive calls.

We need to change each of those to use a "ternary" expression: 0.5 if ch in 'aeiou' else 1 in place of assuming a cost of 1 per character added or removed.

For the base cases, we can replace the return values with sum calls on a generator expression that includes the ternary expression:

if i == 0:  
    return sum(0.5 if ch in 'aeiou' else 1 for ch in source)
elif j == 0: 
    return sum(0.5 if ch in 'aeiou' else 1 for ch in target)

For later cases, we can replace the +1 with the ternary expression itself (with an index rather than the ch iteration variable):

return min(levenshtein(target[:i-1],source) + (0.5 if target[-1] in 'aeiou' else 1),
           levenshtein(target, source[:j-1]) + (0.5 if source[-1] in 'aeiou' else 1),
           levenshtein(target[:i-1], source[:j-1])+substCost(source[j-1],target[i-1]))

If you wanted to generalize this, you could move the ternary expression out into its own function, named something like addCost, and call it from the code in the levenshtein function.

Blckknght
  • 100,903
  • 11
  • 120
  • 169