3

I have a FULLTEXT search in a table of part numbers. Some part numbers have hyphens.

The table engine is InnoDB using MySQL 5.6.

The problem I am having is that MySQL was treating the hyphen (-) character as a word separator.

So I created a new MySQL charset collation whereas the hyphen is treated as a letter.

I followed this tutorial: http://dev.mysql.com/doc/refman/5.0/en/full-text-adding-collation.html

I made a test table, using the syntax at the bottom of the link, however i used the InnoDB Engine. I searched for '----' and received "syntax error, unexpected '-'"

However If I change the engine to MyISAM, I get the correct result.

How to I get this to work with the InnoDB engine?

It seems with MySQL its one step forward and two steps back.

Edit: I found this link for 5.6 (http://dev.mysql.com/doc/refman/5.6/en/full-text-adding-collation.html), which is the same tutorial using InnoDB as the engine.

But here's my test:

create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=InnoDB

Added a row that is just "----"

select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)

syntax error, unexpected '-'

Drop the table, MyISAM

create table test (a TEXT CHARACTER SET latin1 COLLATE latin1_fulltext_ci, FULLTEXT INDEX(a)) ENGINE=MyISAM

Added a row that is just "----"

select * from test where MATCH(a) AGAINST('----' IN BOOLEAN MODE)

1 result

Edit 2, if it helps to see visually, heres my 2 tests:

MyISAM

InnoDB

Anthony
  • 648
  • 1
  • 7
  • 22
  • show your actual query. sounds more like you did `MATCH foo AGAINST(---- ...)`, with your search string somehow becoming a bare string. – Marc B Aug 12 '14 at 16:20

2 Answers2

1

I encountered this exact issue recently. I had previously added a custom collation per the docs and was using MyISAM and it was working fine. Then a few weeks ago switched to InnoDB and things stopped working. I tried:

  • Rebuilding my collation and A/B testing to make sure they are working
  • Disabling stopword by setting innodb_ft_enable_stopword to 0
  • Rebuilding my fulltext table and index

In the end I took a different approach since InnoDB doesn't seem to follow the same rules as MyISAM when it comes to fulltext indexing. This is a bit hacky but works for my application:

  1. Create a special search column containing the data I need to search for. This column has a fulltext index and exists for the sole purposes of doing a fulltext search, which is still very fast on a table with millions of rows.
  2. Search/replace all - in my search column with an unused character that is considered a "word" character. See my question here regarding this: https://dba.stackexchange.com/questions/248607/which-characters-are-considered-word-characters. Figuring out what word characters are turns out to be not so easy but here are a few that worked for me: Ω œ π µ. These characters are probably not used in the data you need to be searching but they will be recognized by the parser as searchable characters. In my case I replace - with Ω. Since I only need the row ID, it doesn't matter what the data in this column looks like to human eyes.
  3. Revise my updates and inserts to keep the search column data and substitutions up to date. In my case this was easy since there is only one place in the application that updates this particular table. A couple of triggers could also be used to handle this:

    CREATE TRIGGER update_search BEFORE UPDATE ON animals
    FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω');
    
    CREATE TRIGGER insert_search BEFORE INSERT ON animals
    FOR EACH ROW SET NEW.search = REPLACE(NEW.animal_name, '-', 'Ω');
    
  4. Replace - in my search queries with Ω.

Voila. Here's a fiddle demonstrating: https://www.db-fiddle.com/f/x1WZpZP6wcqbTTvTEFFXYc/0

The above workaround might not be realistic for every application but hopefully it's useful for someone. Would be great to have a real solution to this for InnoDB.

0

The InnoDb FULLTEXT search is probably treating the hyphens as stop-words. So when it gets to the second hyphen, it would expect a word, not a hyphen. This would explain the 'syntax error'.

Why it doesn't do this in MyISAM is because the implementation in InnoDB of FULLTEXT indexes is quite different, and of course, they've only been added for InnoDB in MySQL 5.6.

What can you do about this? It seems you can influence the list of stop-words through a special table: http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_ft_user_stopword_table. This could stop MySQL from treating hyphens as stop-words.

kasimir
  • 1,506
  • 1
  • 20
  • 26