0

I have a KMP code in PHP which is can do string matching between word to text. I wonder if i can use KMP Algorithm for string matching between text to text. Is it possible or not? and how can i use it for finding the matching of the string between 2 text.

Here's the core of KMP algorithm :

<?php
    class KMP{
      function KMPSearch($p,$t){
        $result = array();
        $pattern = str_split($p); 
        $text    = str_split($t);
        $prefix = $this->preKMP($pattern);
    // print_r($prefix);

     // KMP String Matching
     $i = $j = 0;
        $num=0;
        while($j<count($text)){
          while($i>-1 && $pattern[$i]!=$text[$j]){
         // if it doesn't match, then uses then look at the prefix table
            $i = $prefix[$i];
          }
          $i++;
          $j++;
      if($i>=count($pattern)){
         // if its match, find the matches string potition
      // Then use prefix table to swipe to the right.
            $result[$num++]=$j-count($pattern);
            $i = $prefix[$i];
          }
        }
     return $result;
      }

      // Making Prefix table with preKMP function
      function preKMP($pattern){
        $i = 0;
        $j = $prefix[0] = -1;
        while($i<count($pattern)){
          while($j>-1 && $pattern[$i]!=$pattern[$j]){
            $j = $prefix[$j];
          }
          $i++;
          $j++;
          if(isset($pattern[$i])==isset($pattern[$j])){
            $prefix[$i]=$prefix[$j];
          }else{
            $prefix[$i]=$j;
          }
        }
        return $prefix;
      }
    }
    ?>

I calling this class to my index.php if i want to use to find word on the text.

This is the step that i want my code do : (1). I input a text 1 (2). I input a text 2 (3). I want a text 1 become a pattern (every single word is in text 1 treat as pattern) (4). I want my code can find every pattern on text 1 in text 2 (5). Last, my code can show me what the percentage of similarity.

Hope you all can help me or teach me. I've been serching for the answer everywhere but can't find it yet. At least you can teach me.

Hikaru18
  • 77
  • 1
  • 10
  • what do you mean by "matching of the string between 2 text"? – Laney May 20 '17 at 03:39
  • I mean this : example i have 2 text here.. Text 1 and Text 2 : Text 1 : My holiday was really awesome. There is a beach in front of my grandmother's house. I just need to walk to go to the beach. Text 2 : I just got free day. My holiday is begin and i want to go to the beach. I found a nice beach on the map which is a unwell-known beach. You need to walk through the forest to find this beach. I Want compare text 1 to text 2, i supposed to get matching string : My holiday and The Beach. Because those string are exactly excist in Text 1 and Text 2. – Hikaru18 May 20 '17 at 03:58

1 Answers1

-1

If you just need to find all words that are present in both texts, you don't any string search algorithm to do it. You can just add all words from the first text to a hash table, iterate over the second text and add the words that are in a hash table to the output list.

You can use a trie instead of a hash table if you want a linear time complexity in the worst case, but I'd get started with a hash table because it's easy to use and is likely to be good enough for practical purposes.

kraskevich
  • 18,368
  • 4
  • 33
  • 45
  • Thank you for the suggestion. But, i still have to use that algorithm to find the similarity percentage between 2 text, you know like plagiarism detection. Is it possible for find similarity percentage between two text with this algorithm? – Hikaru18 May 20 '17 at 20:02
  • @Hikaru18 What is your definition of similarity percentage? – kraskevich May 20 '17 at 20:33
  • after i find the similarity of the text. i want my code show the number of percentage of the similarity.. like text 1 has 28% similar with text 2. just like that. – Hikaru18 May 20 '17 at 20:44
  • 1
    @Hikaru18 You need to define the notion of similarity mathematically before thinking about how to compute it. What metric do you want to use? Here's a list of a few metrics: https://en.wikipedia.org/wiki/String_metric – kraskevich May 20 '17 at 20:53
  • I just thinking to use levenshtein distance. could you teach me? – Hikaru18 May 20 '17 at 21:00