0

I've been trying to find some help on using MySQL's FULLTEXT search. I realise that this has been discussed to death, but I can't quite understand how to get a concise set of results.

I have a MyISAM table of say 500,000 products with a FULLTEXT index setup on the "product_name" table.

A basic query would be:

SELECT * from products MATCH(product_name) AGAINST ("coffee table") AS relevance 
WHERE MATCH(product_name) AGAINST ("coffee table").

I got a list of a few hundred products that relate to either coffee or tables. This wasn't specific enough and meant that useful results were cluttered with too many other items.

I altered my query to use MATCH to give a relevance to each result, and then used LIKE to perform the actual query.

SELECT * from products MATCH(product_name) AGAINST ("coffee table") AS relevance 
WHERE ((product_name like "%coffee%" AND product_name like "%table%") or product_name like "%coffee table%")

This idea I got from seeing how Wordpress performs a search. This worked well until someone performs a search with more specific keywords. A real-world example was a search for "Nike blazer low premium vintage". In this case, there were no results (whereas the first method using MATCH returned hundreds)

I know I can use IN BOOLEAN MODE, but many users won't know to use the +/- operators to alter their query. I'm yet to work out how I should use the HAVING clause to limit results.

Also, due to this being shared hosting, I am unable to alter the default min word length - which means missing keywords like the colour "red" or the brand-name "GAP" for example.

I have read a little into creating a keyword index table, but have not found suitable references for this.

Can someone please offer a solution where I can use a product search term (as entered by Joe Public) that will give a concise set of results. Thanks

Alexander Holsgrove
  • 1,795
  • 3
  • 25
  • 54
  • Would you like `Nike blazer low premium vintage` to return something? Also, if I understand correctly, you want to match as many words as possible, and not take into account the built-in relevance computed by fulltext, right? – Tchoupi Sep 11 '12 at 14:47
  • Mathieu - thanks for your reply. I would like it to return results, but not for each keyword individually as that has too many. I also do an ORDER BY on the MATCH/relevance and return the top results. The number it generates will vary with the number of keywords I assume? – Alexander Holsgrove Sep 17 '12 at 10:09
  • The relevance is no only based on the number of keywords, but on the relevance of each keywords: http://dev.mysql.com/doc/internals/en/full-text-search.html – Tchoupi Sep 17 '12 at 13:56
  • OK, well I think that is OK. I sort by relevance and return the top results - so that is all I need for this search I think. The question is still how to get the right amount of results - MATCH doesn't return anything for my example, and LIKE returns too many – Alexander Holsgrove Sep 18 '12 at 08:32
  • Then if you don't want to use the default behavior, and want to match *as much keywords as possible*, why don't you use Boolean search? The default operator is `OR` and the relevance will be the number of matched keywords. – Tchoupi Sep 18 '12 at 12:05
  • This does seem better. So I guess I'd want to set the score to match say at least 50% of the keywords? "having relevance > floor(count(keywords)*0.5" – Alexander Holsgrove Sep 19 '12 at 13:26
  • Yes exactly, that could work. – Tchoupi Sep 19 '12 at 13:34
  • This seems to work, although the stopwords and minimum length words need to be removed before searching otherwise it breaks. Match also seems to ignore plurals in boolean mode and I think putting a * around each word is a bad idea. As far as my other method (MATCH + LIKE) - would this work? – Alexander Holsgrove Sep 20 '12 at 11:55

1 Answers1

0

I have done more research and as many people have said, it's not a good solution for "human" like searching - one example is how it handles word plurals (car / cars). I looked at Apache Lucene but it's beyond my ability to setup and configure.

For the moment, the "solution" has been to stick with IN BOOLEAN MODE (as Mathieu also suggested).

Alexander Holsgrove
  • 1,795
  • 3
  • 25
  • 54