1

I've built a search index (two column table) which assigns single keywords to their content ID.

Now searching for single keywords works and is damn fast.

However, searching for multiple keywords like SELECT media_id, keyword FROM search_index WHERE keyword = 'b' OR keyword = 'a' will return results in alphabetical keyword order (all hits on 'a' first, then 'b').

I tried doing it with PHP on the results page, but that requires me to load a lot more data from the database than I want to display, to get an acceptable sorting. With that technique the script might even time out on a keywords that been assigned more frequently.

I also tried GROUP BY which seemed to group the result from the alphabetical order together to their single IDs.

Edit: Lastly I found something like SELECT DISTINCT media_id, keyword, COUNT(media_id) AS num FROM search_index GROUP BY media_id ORDER BY num DESC, which works kinda well but is really slow.

So if i search for 'b' and 'a', I want the IDs with 'a' and 'b' first in the result.

So how can I tell MySQL to sort the output by the frequency of hits on one ID while still being damn fast?

Cobra_Fast
  • 15,671
  • 8
  • 57
  • 102

3 Answers3

1

How about something like this?

SELECT *, CAST(keyword='a' AS SIGNED)+CAST(keyword='b' AS SIGNED) AS rank FROM search_index ORDER BY RANK DESC

Michael Mior
  • 28,107
  • 9
  • 89
  • 113
0

MySQL has full text search which returns a relevancy score.

Michael Mior
  • 28,107
  • 9
  • 89
  • 113
  • Yeah, thanks for pointing me at what I try not to use with my search index solution here. Sure, I could put all the keywords in a `TEXT` field and fulltext search them, but did you ever feel the performance of something like that on a 300K+ row table? I guess not. (It takes up to a minute, where my index can provide search results within few milliseconds). – Cobra_Fast Jun 28 '11 at 14:25
  • Sorry. Just trying to be helpful. – Michael Mior Jun 28 '11 at 14:30
  • I appreciate help very much, I really do! But it's not leading anywhere when an answer defys any intention of the question. – Cobra_Fast Jun 28 '11 at 14:32
  • Unfortunately it's not always possible to clearly discover the intent of questions. I posted another answer that might be more helpful :) – Michael Mior Jun 28 '11 at 14:58
0

Okay I figured it out myself.

The fastest seems to be to do SELECT media_id, keyword, COUNT(media_id) AS num WHERE ... GROUP BY media_id and then going over the num field with a simple php-usort function.

function SortFrequency($a, $b)
{
     if ($a['num'] == $b['num']) return 0;
     return ($a['num'] > $b['num']) ? -1 : 1;
}

usort($results, 'SortFrequency');
Cobra_Fast
  • 15,671
  • 8
  • 57
  • 102