0

I would like to compare to set of text and get how similar/relevant they are to each other, I use similar_text(), but i found out its not as accurate. Thank you.

For example the following text gives me 66%

Text1: Innovation game, eat, sleep, breathe innovation. It love creativity & passion power Internet drives . We understand time greatest asset, challege meet deadline.

Text2: Soviet union communist policy; Germany league organization disguise enermies beaten wanted.

My code is as below:

echo $student_answer = removeCommonWords($answer)."<br><br>";

$student_answer = strip_tags($student_answer);

echo $memo = removeCommonWords2($memo)."<br><br>";

echo similar_text($memo, $student_answer);
Android Developer
  • 987
  • 1
  • 8
  • 22
Wakaitu
  • 3
  • 1
  • You will have to define what "accurate" means according to you. For example, tell us what result you are expecting from your two sample strings, as well as a short description of the method you used to come up with this result. – RandomSeed Jul 10 '13 at 11:26
  • accuracy in percentage as per the this code: echo similar_text($memo, $student_answer,$percentage); Lets say you would like to compare the student's answer to that in the memo. – Wakaitu Jul 10 '13 at 11:44
  • But you said this function is not giving you satisfactory result? – RandomSeed Jul 10 '13 at 11:47
  • In other words, why is "66%" not accurate for you? (please note, with your sample texts, I get a similarity of 21.5% here on my system). – RandomSeed Jul 10 '13 at 11:51
  • The reason 66% is not accurate is because the two text do not contain similar words and are not related in meaning either, and meaning is the core of this task. I have removed common words from all text before I compared the two results, which I thought would give me a more accurate result. – Wakaitu Jul 10 '13 at 15:18
  • So you are looking to compare words, and not characters? Indeed, then `similar_text()` is not just "inaccurate", it is actually not at all what you are looking for, since it does characters comparison (eg. 'abc' and 'cab' have a similarity of 67%). – RandomSeed Jul 10 '13 at 15:44
  • So you want to calculate similarity in terms of *semantics*. Please make it clear in your question body, it is not obvious at all at first sight. You will need to build a list of synonyms, with their respective similarity. Not a trivial task on its own. Then you will have to roughlt calculate the similarity of each word from one sentence with all words in the other sentence, and extract a score from this comparison. The method of computing this score is not at all a trivial task either, and certainly requires some heuristics too. Good luck! – RandomSeed Jul 10 '13 at 15:50

1 Answers1

0

You can use the JS version:

http://phpjs.org/functions/similar_text/

The JS code shows you the precent code (you can modify the code):

return (sum * 200) / (firstLength + secondLength);

I hope this will help you!

EDIT:

How to use similar_text in JS?

  1. Create a file named similar_text.js and copy&paste this code in it:

     function similar_text (first, second, percent) {
     // http://kevin.vanzonneveld.net
     // +   original by: Rafał Kukawski (http://blog.kukawski.pl)
     // +   bugfixed by: Chris McMacken
     // +   added percent parameter by: Markus Padourek (taken from http://www.kevinhq.com/2012/06/php-similartext-function-in-javascript_16.html)
     // *     example 1: similar_text('Hello World!', 'Hello phpjs!');
     // *     returns 1: 7
     // *     example 2: similar_text('Hello World!', null);
     // *     returns 2: 0
     // *     example 3: similar_text('Hello World!', null, 1);
     // *     returns 3: 58.33
     if (first === null || second === null || typeof first === 'undefined' || typeof second === 'undefined') {
       return 0;
     }
    
     first += '';
     second += '';
    
     var pos1 = 0,
       pos2 = 0,
       max = 0,
       firstLength = first.length,
       secondLength = second.length,
       p, q, l, sum;
    
     max = 0;
    
     for (p = 0; p < firstLength; p++) {
       for (q = 0; q < secondLength; q++) {
         for (l = 0;
         (p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
         if (l > max) {
           max = l;
           pos1 = p;
           pos2 = q;
         }
       }
     }
    
     sum = max;
    
     if (sum) {
       if (pos1 && pos2) {
         sum += this.similar_text(first.substr(0, pos2), second.substr(0, pos2));
       }
    
       if ((pos1 + max < firstLength) && (pos2 + max < secondLength)) {
         sum += this.similar_text(first.substr(pos1 + max, firstLength - pos1 - max), second.substr(pos2 + max, secondLength - pos2 - max));
       }
     }
    
     if (!percent) {
       return sum;
     } else {
       return (sum * 200) / (firstLength + secondLength);
     }
    }
    
  2. In your put the following line:

      <script type="text/JavaScript" src="YOUR_PATH/similar_text.js"></script>
    
  3. Now you can use it in your body:

      <script>
       similar_text('Hello World!', 'Hello phpjs!');
      </script>
    

It will output 7.

Hope this wil help you!