1

I need MYSQL search query to get trending topics from my table, Below is the explanation What I need

+----+---------+-----------------------------+
| ID | ID_user | text                        | 
+----+---------+-----------------------------+
| 1  | bruno   | michael jackson is dead     |
| 2  | thomasi | michael j. moonwalk is dead |
| 3  | userts  | michael jackson lives       |
+----+---------+-----------------------------+

i want query the words most repeated on the table, limit top 10, the result may be this:

+-------+------------+
| count | word       |
+-------+------------+
| 3     | michael    |
| 2     | dead       |
| 2     | jackson    |
| 1     | j.         |
| 1     | lives      |
| 1     | moonwalk   |
+-------+------------+

But I want search only words that repeat more of 10 times, in this case noone word is appear, but if criteria for repetead words is 2, it will display only 'michael' and 'dead', but ignore 'is' because I dont want words with less 2 chars of lenght, and the words that a phrase, then I need apear this:

+-------+-----------------+
| count | word            |
+-------+-----------------+
| 2     | michael jackson |
| 2     | dead            |
+-------+-----------------+
Fabio
  • 23,183
  • 12
  • 55
  • 64
  • There is a basic misunderstanding involved: RDBMS are designed to work **with** field contents, not **on** field contents. While it might be possible to create such a query, it most possibly is a subobtimal solution to process this inside the DB – Eugen Rieck May 28 '13 at 21:53

1 Answers1

1
CREATE TEMPORARY TABLE counters (id INT);
-- insert into counters as much as you like (values here means "number of repeats"
INSERT INTO counters VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9),(10),
                           (11),(12),(13),(14),(15),(16),(17),(18),(19),(20),
                           (21),(22),(23),(24),(25),(26),(27),(28),(29),(30);

  SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1) AS word,
         COUNT(counters.id) AS counter
    FROM texts
         INNER JOIN counters ON (LENGTH(text)>0 AND SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id),' ',-1) <> SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id-1),' ', -1))
   WHERE length(SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1)) > 2
GROUP BY word
  HAVING COUNT(counters.id) > 1
ORDER BY counter desc;

but it's not very efficient and shouldn't be done like that

edit:

  SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1) AS word,
         COUNT(counters.id) AS counter
    FROM texts
         INNER JOIN counters ON (LENGTH(text)>0 AND SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id),' ',-1) <> SUBSTRING_INDEX(SUBSTRING_INDEX(text,' ',counters.id-1),' ', -1))
   -- exclude words list
   WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(texts.text,' ',counters.id),' ',-1) NOT IN ('is', 'of', 'this', 'to')
GROUP BY word
  HAVING COUNT(counters.id) > 1
ORDER BY counter desc;
Ochi
  • 1,468
  • 12
  • 18
  • great help but not getting the result. – user2326760 May 29 '13 at 12:47
  • what results would you like then? I don't get "...and the words that a phrase..." – Ochi May 29 '13 at 17:47
  • I explain it. I've a table "post". there is a column of "status" now I need most common words from status. But I should also exclude the words like "of, this, to etc". Is there anything for me that can help? – user2326760 May 29 '13 at 19:21