1

I am trying to solve exercise 32.1-2 from the CLRS Book, which is about string algorithms, naive pattern search

Suppose that all characters in the pattern P are different. Show how to accelerate NAIVE-STRING-MATCHER to run in time O(n) on an n-character text.

So I am trying to optimize the naive brute force solution I came up with, but I don't think I can do any better to reduce the overall running time to O(n).

 <?php

 //naive search
$a = array('a', 'b', 'u', 'c');
$b = array('a','b','u','c','a','b','u','c','b','a','b','u','c','b', 'a', 'b','c');
//index     0   1  2    3  4   5    6   7  8    9  10   11 12  13  14    15   16
$n = count($b);
$k = count($a);
$counter = 0;

    for($i=0;$i<$n - $k ;$i++){   // big- O (n)


 //since its "exact string matching problem" i am testing here so i don't dive into second loop unless the ith character of B is matching the first char of the pattern 

     if($b[$i] == $a[0]){
            for($j=$i; $j<$k; $j++){ // big O(k)
                if($b[$j] == $a[$j])
                    $bool = true;
                else {
                    $bool = false;
                    break;   
                }
            }
            if($bool){
                echo "Found at index: ".$i."<br>";
                $counter++;
            }
// since pattern match cant overlap with another one, so when one is found jump by K iteration, here is all what I could do about the pattern's value being distinct, is there any possible optimization I can do
           $i = $i + $k - 1;   
        }


    }

echo $counter;
?> 

I certainly reduced the running time for this particular instance, but imagine the worst case a Text with all its chars set to 'a', I will dive into the second loop each and every time which is O(k*n).

What is the big-O of the algorithm? and can I get more efficient solution?

Mohamed Kira
  • 417
  • 2
  • 5
  • 14

1 Answers1

0

You also get the idea right ("since pattern match cant overlap with another one"). Something like this should work for the main loop:

for($i=0;$i<$n - $k ;$i++){
            for($j=0; $j<$k; $j++){
                $last_matched = $j + $i;
                if($b[$j + $i] == $a[$j])
                    $bool = true;
                else {
                    $bool = false;
                    break;   
                }
            }
            if($bool){
                echo "Found at index: ".$i."<br>";
                $counter++;
            }
           // this line is important
           $i = $last_matched + 1;   
        }

Note the important line. Here we tell algorithm to start next attempt for matching after our previous match failed (or finished). This is because pattern has distinct characters, and there is no possibility that if you matched j characters already and then failed to match j+1 character, that the real match will overlap this region (if they overlap, some characters in pattern should be same, which is contradiction).

Now the complexity of the changed algorithm will be O(n). This is because the if condition in inner loop would be executed only once for each character of the text (remember that after the inner loop finishes or breaks we start outer loop after its last position).

P.S.: Multiplying for loops complexity is often right, but you not always get the tightest bound possible.

usamec
  • 2,156
  • 3
  • 20
  • 27
  • regarding the last note u wrote, as i know we use Θ to denote a tight upper bound & O to denote an upper bound. will my solution be optimal if the exercise asked for Θ not O ? – Mohamed Kira Aug 11 '17 at 18:15
  • i think i didnt understand the question well enough, becuz the best pattern matching algo "KMP" runs at O(m + n) – Mohamed Kira Aug 11 '17 at 18:30
  • People that to use O and Thera interchangably (althought they are diffenent). My statements also hold with theta notation. – usamec Aug 11 '17 at 20:35
  • Also KMP algorithm is general pattern matching algorithm with will work in this case. But this special case is good exercise of algorithmic thinking and its extension can lead to Boyer moore algorithm. – usamec Aug 11 '17 at 20:36