26

Here's what i want to do:

  • match a search subject against multiple fields of my table
  • order the results by importance of the field and relevance of the matching (in that order)

Ex: let's assume I have a blog. Then someone searches for "php". The results would appear that way:

  • first, the matches for the field 'title', ordered by relevance
  • then, the matches for the field 'body', ordered by relevance too
  • and so on with the specified fields...

I actually did this with a class in PHP but it uses a lot of UNIONS (a lot!) and grows with the size of the search subject. So I'm worried about performance and DOS issues. Does anybody has a clue on this?

Hugo Mota
  • 11,200
  • 9
  • 42
  • 60

6 Answers6

43

Probably this approach of doing a weighted search / results is suitable for you:

SELECT *,
    IF(
            `name` LIKE "searchterm%",  20, 
         IF(`name` LIKE "%searchterm%", 10, 0)
      )
      + IF(`description` LIKE "%searchterm%", 5,  0)
      + IF(`url`         LIKE "%searchterm%", 1,  0)
    AS `weight`
FROM `myTable`
WHERE (
    `name` LIKE "%searchterm%" 
    OR `description` LIKE "%searchterm%"
    OR `url`         LIKE "%searchterm%"
)
ORDER BY `weight` DESC
LIMIT 20

It uses a select subquery to provide the weight for ordering the results. In this case three fields searched over, you can specify a weight per field. It's probably less expensive than unions and probably one of the faster ways in plain MySQL only.

If you've got more data and need results faster, you can consider using something like Sphinx or Lucene.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • i like this approach! can you explain me what's going on in the IFs before the FROM? i'm not used with complex querys =/ – Hugo Mota Jun 27 '11 at 18:42
  • 4
    Basically it's an IF function, if the condition (first argument) is true, second argument will be used (the weight) otherwise third argument will be used (0-weight). The manual has all the details: http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if – hakre Jun 27 '11 at 18:45
  • that's very nice! so if it appears in name (not at the start), description and url, it will get an weight of 16? (10+5+1) – Hugo Mota Jun 27 '11 at 19:03
  • Exactly. So you can make an individual weight per field. – hakre Jun 27 '11 at 19:05
  • thank you! this solves my problem! next step is to make similar searches to match, like "gogle" finding "google". but this is another question. haha – Hugo Mota Jun 27 '11 at 19:22
  • 1
    For this kind a "similarity" search you ought to use Lucene / SolR. – Stephan L Nov 16 '14 at 13:18
  • 1
    Excellent solution! It's simple and it supports multiple search terms and multiple searchable fields. Thank you! – Rooster242 Apr 20 '16 at 15:37
10

you can add multiple mysql MATCH() values together, first multiplying each one by their weight.

simplified of course...

'(MATCH(column1) AGAINST(\''.$_GET['search_string'].'\') * '.$column1_weight.')
 + (MATCH(column2) AGAINST(\''.$_GET['search_string'].'\') * '.$column2_weight.')
 + (MATCH(column3) AGAINST(\''.$_GET['search_string'].'\') * '.$column3_weight.')
 AS relevance'

then

'ORDER BY relevance'
dqhendricks
  • 19,030
  • 11
  • 50
  • 83
  • if your weights are correctly set, it will place them in the desired order. – dqhendricks Jun 27 '11 at 18:07
  • can you give me an example of how to set the weights for this? – Hugo Mota Jun 27 '11 at 18:14
  • @hugo_leonardo the weights depend on how much more important a match in the title is compared to the body. so if the title match is 5x more important than a body match, the weights would be 5 and 1 respectively. does this make sense? – dqhendricks Jun 27 '11 at 18:22
  • 1
    @hugo_leonardo if there is a severly weak match in the title, and a very strong match in the body, it is still possible that a body matched item could appear before a title matched item. this is a better way to do it however since, not all title match should be more valueable than all body matches, depending on the match scores. also, keep in mind that the fields being matched against need to have a full text index on them. – dqhendricks Jun 27 '11 at 18:25
  • @dqhendricks well, it makes sense! but, not all the fields will be TEXT and only MYISAM tables support text indexes =/ – Hugo Mota Jun 27 '11 at 18:39
  • +1 - nice answer to make use of that with match() for fulltext indices. - however certain preconditions must be met. – hakre Jun 27 '11 at 19:08
  • 1
    But **obviously** you'd escape your `$_GET` variables before using them in SQL. – David Yell Aug 21 '13 at 10:19
  • 1
    @DavidYell Yes, or better yet, use PDO prepared statements. – dqhendricks Aug 23 '13 at 19:26
3

There is a native and clean way to do this using MySQL's CASE function (https://dev.mysql.com/doc/refman/5.7/en/case.html).

Example (untested):

SELECT * FROM `myTable` 
WHERE (`name` LIKE "%searchterm%" OR `description` LIKE %searchterm%" OR `url` LIKE "%searchterm%")
ORDER BY CASE
WHEN `name`        LIKE "searchterm%"  THEN 20
WHEN `name`        LIKE "%searchterm%" THEN 10
WHEN `description` LIKE "%searchterm%" THEN 5
WHEN `url`         LIKE "%searchterm%" THEN 1
ELSE 0
END
LIMIT 20

Have used this for many weighted searches of my own and works an absolute treat!

2
SELECT post_name, post_title,
    (CASE WHEN `post_name` LIKE '%install%' THEN(9 / LENGTH(post_name) * 100) ELSE 0 END) 
    + (CASE WHEN `post_title` LIKE '%install%' THEN(9 / LENGTH(post_title) * 50) ELSE 0 END)
        AS priority
FROM wp_posts
WHERE
    post_title LIKE '%install%'
    OR post_name LIKE '%install%'
ORDER BY priority DESC

This query will not only check weight in columns, but also in each row:

  • Checks how important search word is in each field cell. For example install wins over install something if searching for install (length is included in weight calculation).
  • Each field can have assigned weights (100 and 50 in this case, optional).

Basically, if we have these values and search for install: (1 column example, but it works with multiple columns too)

  • "Something else about install"
  • "install"
  • "install something"

Search will gives this order:

  • "install" - 128 weight
  • "install something" - 52 weight
  • "Something else about install" - 32 weight
ViliusL
  • 4,589
  • 26
  • 28
1

You should use a dedicated indexer to prefetch all of the data into an optimized, searchable index. Sphinx and similar products do this very well.

George Cummins
  • 28,485
  • 8
  • 71
  • 90
1

I had this exact same question and it was fully answered on one of the MySQL forums. Here's the thread. Kind of a long thread (because I'm kind of long-winded) but the payoff is just what you're looking for.

Pete Wilson
  • 8,610
  • 6
  • 39
  • 51