3

I checked the KMP table-building algorithm from Wikipedia but I don't understand the logic behind the second case of the while loop

(second case: it doesn't, but we can fall back)
        else if cnd > 0 then
            let cnd ← T[cnd]

I tried to build a table using this algorithm and it works perfectly. I understand that cnd ← T[cnd] helps to find the proper suffix length. What I don't understand is "how" it does it?

An explanation with an example would be nice.

Thanks!

Edit: I just found out that my question is a duplicate of the question here: "Partial match" table (aka "failure function") in KMP (on wikipedia)

I think I get the answer now. Still, one more explanation would be helpful. Thanks!

Community
  • 1
  • 1
Naruto Uzumaki
  • 958
  • 4
  • 17
  • 37

1 Answers1

3

Suppose you have a string Hello World!!! and you want to search for Head Up.

Hello World!!!
Head Up
  ^

When you are in the first and second character the first condition apply (first case: the substring continues), in the case of the marked position, the character don't match but you are already inside the a sub-string match (2 character matched until there), this case correspond to the second conditional (second case: it doesn't, but we can fall back). The third case would be when you miss match the first character of the pattern.

The second condition is necessary because you could use the information of the matched character until the miss match, to avoid unnecessary comparison that you could already known the result (skip characters of the string that you already know that the beginning part of the pattern would don't match).

Example: With string HeHello World!!! and searching for Hello

HeHello World!!!
Hello
  ^ when you miss match this character using the table of KMP you known that 
    could skip 2 characters because

HeHello World!!!
 Hello
 ^ this would miss match

In the case of building the pattern table for pattern HeHello. Suppose ^ is cnd and * is pos. The start point is pos = 2 and cnd = 0 (but when checking in the pattern is with pos - 1 = 1).

HeHeHello     T [-1,0,0,0,0,0,0,0,0]
^* comparing 0 with 1 go to condition 3        cnd = 0, pos = 2
                        _
HeHeHello     T [-1,0,0,1,0,0,0,0,0]
^ * comparing 0 with 2 go to condition 1       cnd = 0, pos = 3
                          _
HeHeHello     T [-1,0,0,1,2,0,0,0,0]
 ^ * comparing 1 with 3 go to condition 1      cnd = 1, pos = 4
                            _
HeHeHello     T [-1,0,0,1,2,3,0,0,0]
  ^ * comparing 2 with 4 go to condition 1     cnd = 2, pos = 5
                              _
HeHeHello     T [-1,0,0,1,2,3,4,0,0]
   ^ * comparing 3 with 5 go to condition 1    cnd = 3, pos = 6

HeHeHello     T [-1,0,0,1,2,3,4,0,0]
    ^ * comparing 4 with 6 go to condition 2 (cnd = T[cnd], cnd = T[4] = 2)

HeHeHello     T [-1,0,0,1,2,3,4,0,0]
  ^   * comparing 2 with 6 go to condition 2 (cnd = T[cnd], cnd = T[2] = 0)

...
NetVipeC
  • 4,402
  • 1
  • 17
  • 19
  • Hi! The explanation you have provided is for when I want to match a pattern `Hello` to the string `Hello world`. But my question is about table building algorithm. In table building algorithm, for string `Hello` the first case is false. – Naruto Uzumaki Sep 10 '14 at 16:58
  • Updated with a example – NetVipeC Sep 10 '14 at 18:15