When you're matching the input string, you can only get into state 5 after matching the first 5 characters of the pattern, and the first 5 characters of the pattern are ABABA
. So no matter which input string you use, you know that the text preceding state 5 is "ABABA".
So you if you get a mismatch in state 5, you could back up 4 characters and try matching again. But since you know what text has to appear before state 5, you don't actually need the input text to figure out what would happen. You can figure out beforehand what state you'd end up in when you got back to the same place.
backup 4 characters and go to state 0:
0 : BABA
A doesn't match, so advance and go to state 0
0: ABA
A matches, so go to state 1
1: BA
B matches, go to state 2
2: A
A matches, go to state 3
3:
now we're back to the place in the input where we saw state 5 before, but now we're in state 3.
This will always happen when we get a mismatch in state 5, so instead of actually doing this, we just make a note that says "when we get a mismatch in state 5, go to state 3".
Note that most KMP implementations will actually make a failure table where failure_table[5]=3
. Your example implementation is building the full DFA[char][state]
instead, so it copies all the transitions from state 3 into state 5 for the failure cases. That says "when we get a mismatch in state 5, do whatever state 3 does", which works out the same.
UNDERSTAND EVERYTHING ABOVE BEFORE MOVING ON
Now let's speed up the calculation of those failure states...
When we get a mismatch in state 5, we can use the DFA we have so far to figure out what would happen if we backed up and rescanned the input starting at the next possible match, by applying the DFA to "BABA". We end up in state 3, so let's call state 3 the "failure state" for state 5.
It looks like we have to process 4 pattern charcters to calculate the failure state for state 5, but we already did most of that work when we calculated the failure state for state 4 -- we applied the DFA to "BAB" and ended up in state 2.
So to figure out the failure state for state 5, we just start in the failure state for state 4 (state 2), and process the next character in the pattern -- the "A" that came after state 4 in the input.