0

I saw this Levenshtein formula on Wikipedia:

enter image description here

I have implemented this algorithm in a recursive way (I know it is an inefficient way to implement it such a way, but I wanted to see how much inefficient it was), here is the code (in PHP):

function lev($str1, $str2, $i, $j) {

    if (min($i, $j) == 0) {
      return max($i, $j);
    }
    else {
      $m = ($str1[$i-1] == $str2[$j-1]) ? 0 : 1;
      return min(lev($str1, $str2, $i, $j - 1) + 1,
                 lev($str1, $str2, $i - 1, $j) + 1,
                 lev($str1, $str2, $i - 1, $j - 1) + $m);
    }
}

$str1 = "long long text"; 
$str2 ="absolute";

echo lev($str1, $str2, strlen($str1),strlen($str2));

When I test it like I did for those two strings (even if "long long text" is not such long) I get a "Max execution time of 30 seconds"..., but the function seems to work with strings where the Levenshtein distance is low (e.g. $str1 = "word", $str2 = "corw")

Exceeding 30 seconds to complete this script is too much, so maybe I have typed something wrong in the implementation (but when I look at the implementation I don't see any error, it seems to me I have wrote the correct algorithm if based on the Wiki's formula)

Is this implementation so slow or am I wrong somewhere in the code?

Thanks for the attention!

tonix
  • 6,671
  • 13
  • 75
  • 136
  • Have you had a look at the [existing PHP Levenshtein implementation](http://php.net/manual/en/function.levenshtein.php)? – i alarmed alien Oct 29 '14 at 10:52
  • Yes I did, I know it uses O(N*M) complexity, but I wanted to see how this algorithm works, just to see the difference when comparing both, so I implemented it). Do you think 30 sec are normal or have you found an error in the code? – tonix Oct 29 '14 at 10:57

1 Answers1

2

You do not use memoization in your code so it has exponential time complexity. That's why it is so slow. You can add memoization to avoid computing the value of the function more than once for the same i and j to achieve O(N * M) time complexity.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
  • you mean for $str1 and $str2? That they are passed to the lev() function at every subcall? – tonix Oct 29 '14 at 10:59
  • @user3019105 No. I mean `i` and `j`. The way it is implemented now is slow because `lev(...)` can be called multiple times for the same `i` and `j`. – kraskevich Oct 29 '14 at 11:00
  • All right, I guess I am starting to understand, so the complexity of this recursive implementation is a power of 3 like 3^x, am I right? But what is the x for the exponent? – tonix Oct 29 '14 at 11:15
  • 1
    @user3019105 The relation is T(n, m) = T(n - 1, m) + T(n, m - 1) + T(n - 1, m - 1). It is not exactly 3^x for some x. I cannot find a concrete formula for the number of operations. – kraskevich Oct 29 '14 at 11:33
  • So internally is like a 3-ary tree is build and each node is a subcall, when the expression (n - 1, m) creates the node (0, m), this node won't create no more children and the subcall will return to the caller, am I right? So the total nnumber of subcalls equals to the number of the nodes in this 3-ary tree, right? – tonix Oct 29 '14 at 11:41