0

I am trying to search an entire table and return the most occurring phrases (up to three words) within a long string. I believe I could use fulltext search, but I am not matching against anything...

table

I like Iron Man 3 so much
Iron Man 3 sucked alot
Iron Man really saved the day
I like cats
cats are cool

result

Iron Man
Iron Man 3
cats

query

SELECT *
    FROM table
    WHERE substring(text, up to 3 words) OCCURS MOST
    ORDER BY OCCURRENCE DESC
user2359318
  • 83
  • 1
  • 1
  • 5
  • 1
    *"up to 3 words"* How do you expect MySQL to decide what 3 words? – Kermit May 07 '13 at 18:05
  • I have no idea because it can be any series of consecutive words that occur most – user2359318 May 07 '13 at 18:06
  • 1
    I don't understand the NARQ votes; the questions seems clear enough to me... – eggyal May 07 '13 at 18:07
  • 2
    @FreshPrinceOfSO "I looked at my kingdom I was finally there, I settled my throne as the price of SO" – usumoio May 07 '13 at 18:09
  • In your expected result, why does "Iron Man 3" (two occurrences) show up before "Iron Man" (three occurrences)? Also, in your result, what are the "three words" (`WHERE substring(text, up to 3 words)`) you're searching for? – Ed Gibbs May 07 '13 at 18:10
  • That is my bad, you are correct...fixed post – user2359318 May 07 '13 at 18:11
  • 2
    Now, what do you mean by `up to 3 words`? Will you specify those, or do they have to be calculated? For example, "I" and "I like" also show up twice in your sample input, but not in your result. Does that mean the "up to 3 words" will be ("Iron", "Man", "Cat")? Probably not, or "Iron Man 3" wouldn't be in the results. – Ed Gibbs May 07 '13 at 18:22

1 Answers1

0

If you really are into that, I would say you should parse the texts (non-sql) into a table called word_list, like

create table phrases (word1 varchar, word2 varchar, word3 varchar, cnt int);

and code:

$q = query("select comment from comments");
while ($row = array_read_line($q)){

$words = preg_split('/\s/', $row['comment']);
$previous1 = false;
$previous2 = false;

foreach($words as $word){
     if($previous1 and $previous2){
        .. here comes quoting, security, mysql-injection-safety, min length
        query("update relations set cnt = cnt+1 "
            . " where word1 = '$previous1', word2 = '$previous2', word3='$word'" ) 
        if (rows_afected == 0){
            query("insert into relations "
            . " set cnt = 1, word1 = '$previous1', "
            . " word2 = '$previous2', word3='$word'" ) 
        }
    }
    previous1 = $previous2;
    $previous2 = $word;
}

}

then order by count desc.

flaschenpost
  • 2,205
  • 1
  • 14
  • 29