2

Hi I am writing a PHP class to implement Rabin-Karp algorithm. I have issue with re-hashing part. This code doesn't include matching part of the characters. I had to stop since it never matching hash codes due to the issue with re-hashing. Someone please help me to figure it out.

<?php
 class RabinKarp
 {
    /** 
    * @var String
    */ 
    private $pattern ;
    private $patternHash ;
    private $text ;
    private $previousHash ;

    /** 
    * @var Integer
    */ 
    private $radix ;
    private $prime ;
    private $position ;

    /** 
    * Constructor 
    *
    * @param String $pattern - The pattern
    *
    */ 
    public function __construct($pattern) 
    {
        $this->pattern = $pattern;
        $this->radix = 256;
        $this->prime = 100007;
        $this->previousHash = "";
        $this->position = 0;
        $this->patternHash = $this->generateHash($pattern);
    }

    private function generateHash($key)
    {
        $charArray = str_split($key);
        $hash = 0;
        foreach($charArray as $char)
        {
             $hash = ($this->radix * $hash + ord($char)) % $this->prime;
        }

        return $hash;
    }

    public function search($character)
    {
        $this->text .= $character;
        if(strlen($this->text) < strlen($this->pattern))
        {
            return false;
        }
        else
        {
            $txtHash = 0;
            echo $this->previousHash . "<br/>";
            if(empty($this->previousHash))
            {
                $txtHash = $this->generateHash($this->text);
                $this->previousHash = $txtHash;
                $this->position = 0;
            }
            else
            {
                // The issue is here 
                $charArray = str_split($this->text);
                $txtHash = (($txtHash  + $this->prime) - $this->radix * strlen($this->pattern) * ord($charArray[$this->position]) % $this->prime) % $this->prime;
                $txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime; 

                $this->previousHash = $txtHash;
            }

            if($txtHash == $this->patternHash)
            {
                echo "Hash Match found";
            }
        }

    }

 }

$x = new RabinKarp("ABC");
$x->search("Z");
$x->search("A");
$x->search("B");
$x->search("C");
?>

Thank you.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
Prasad Rajapaksha
  • 6,118
  • 10
  • 36
  • 52
  • Does this algorithm work with mutli-byte string values? If so you should probably use `mb_` functions for splitting en calculating length. – Wesley van Opdorp Mar 26 '12 at 12:15
  • This algorithm written to work with serial transmission. Let's say you get one character at a time sequentially. Once I received the character, then I start the search. The above code works fine for me. Only issue is at re-hashing part. Actually the re-hashing value is incorrect. – Prasad Rajapaksha Mar 26 '12 at 12:21
  • @PrasadRajapaksha Hi Mr. Prasad.. Could you post the fixed codes please. I'm looking for the implementation of rabin karp algorithm for multiple pattern with php. Please... thanks in advance – Ayuktia Jun 13 '16 at 17:46

1 Answers1

1

The value contributed to the hash by the character (c for shortness) you're removing is

ord(c) * radix^(length(pattern)-1)

since a character contributes ord(c) when it first enters the matching window, and the hash - therefore also its contribution - is multiplied with radix for each of the length(pattern)-1 characters entering the matching window until c finally leaves it.

But you're subtracting ord(c) * radix * length(pattern)

$charArray = str_split($this->text);
$txtHash = (($txtHash  + $this->prime)
             - $this->radix * strlen($this->pattern)
             * ord($charArray[$this->position]) % $this->prime)
           % $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;

Additionally, in the calculation you're using the variable $txtHash, which you've set to 0, that should be $this->previousHash, and you must increment the text position.

In principle,

$charArray = str_split($this->text);
$txtHash = (($this->previousHash  + $this->prime)
             - pow($this->radix, strlen($this->pattern)-1)
             * ord($charArray[$this->position]) % $this->prime)
           % $this->prime;
$txtHash = ($txtHash * $this->radix + ord($character)) % $this->prime;

$this->previousHash = $txtHash;
$this->position += 1;

is what you have to do.

But unless the pattern is very short, pow($this->radix,strlen($this->pattern)-1) will overflow, so you have to replace pow($this-radix, strlen($this->pattern)-1) with a modular exponentiation function

function mod_pow($base,$exponent,$modulus)
{
    $aux = 1;
    while($exponent > 0) {
        if ($exponent % 2 == 1) {
            $aux = ($aux * $base) % $modulus;
        }
        $base = ($base * $base) % $modulus;
        $exponent = $exponent/2;
    }
    return $aux;
}

(this can still overflow if $modulus, that is $this->prime here, is too large). The relevant line of code becomes

$txtHash = (($this->previousHash  + $this->prime)
             - mod_pow($this->radix, strlen($this->pattern)-1, $this->prime)
             * ord($charArray[$this->position]) % $this->prime)
           % $this->prime;

Then you have a potentially huge inefficiency

$this->text .= $character;
...
    $charArray = str_split($this->text);

If the string becomes long, the concatenation and the splitting may take a lot of time (not sure how PHP implements strings and string operations, but they're likely not constant time). You should probably keep only the relevant part of the string, i.e. drop the first character after recalculating the hash.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • Thank you so much for your answer. However it still says hash is not matching. May be I misunderstood. I appreciate if you can modify the search function with your answer. In my example ABC is matching with the pattern. – Prasad Rajapaksha Mar 26 '12 at 15:46