0

My search engine is written using mySQL and ColdFusion, the main part of is where I loop the following (or similar) on each search term separated by spaces.

AND productname REGEXP '(\\b#trim(search)#\\b)'

I'd like to introduce alternatives words, so that when any one of these are searched the rest are included

M43, M4/3, Micro-Four-Thirds, MFT

I could use a separate query to get a list of matching keywords and loop in an OR - but this seemed a little unrefined.

My database is a list of camera product names with little consistency, I tested full-text search but was not happy with the results.

So is there an 'Theasaurus' like option without the need of Full-text search implementation?

Daniel Cook
  • 1,033
  • 1
  • 9
  • 19
  • 1
    For a predefined shortlist of known alternatives, I would just concatenate them on the left hand side of the comparison, so that you are looking for `search` to match among all of the `productname` alternatives, rather than trying to substitute `search` many times for matches against the one `productname`. You could store the alternatives in a separate table and pull them in with a join, subquery or udf, or you could denormalize by storing the alternatives as an extra column on the product to simplify the search query. – Sev Roberts Jul 22 '20 at 10:01
  • Thanks, I'm just going over what you said in my head and I'm starting to build a picture. Also the talk of concatenation made me think of this as another back up idea.. REGEXP '\\b#trim(search)#\\b|\\b#trim(altsearch1)#\\b|\\b#trim(altsearch2)#\\b|' – Daniel Cook Jul 22 '20 at 10:08
  • 1
    Yes that is also valid, although my experience of doing similar for product searches for online stores, is that moving the pre-determined known alternatives to the left hand side and reducing the number of `OR` comparisons, resulted in better performance - with the data I was dealing with anyway. I would reserve those regex ORs for inexact alternatives like when implementing stemming - eg taking the user's search term and automatically searching for singular/plural and different conjugations and tenses. – Sev Roberts Jul 22 '20 at 10:42
  • 1
    ie if you know all the alternative names for a product then store them against the product - it's not really any different to people having a `keywords` column on the product table and stuffing whatever they want to match against in there, except that can result in duplicated data. What I was suggesting earlier, applied to your code, was effectively `AND concat(productname, ' ', productAlternativeNames, ' ', productDescription) REGEXP '(\\b#trim(search)#\\b)'` – Sev Roberts Jul 22 '20 at 10:52
  • Many Thanks Sev, I will have a go to implement it. – Daniel Cook Jul 22 '20 at 11:00
  • @SevRoberts I went with the keywords column, and concat_ws(' ', productname, productkeywords) thanks, I tried to look at having a table for synonyms but the logic of implementing it felt chicken and egg and my brain was fried. – Daniel Cook Jul 22 '20 at 13:01

1 Answers1

0

I added a product keyword field, which I populated with alternative keywords.

My SQL where clause has now loops the terms and compares against the concatenated product name and keywords

 AND concat_ws(' ', productname, productkeywords) REGEXP '(\\b#trim(search)#\\b)' 

Thanks to @SevRoberts for this solution

Daniel Cook
  • 1,033
  • 1
  • 9
  • 19