0

I have a table called products which has a varchar column called name encoded in latin1_swedish_ci.

I'm trying to make a search query in order to filter products by name using the MATCH() AGAINST() sentence that look like this:

SELECT *
FROM products
WHERE MATCH(name) AGAINST('*search_string*' IN BOOLEAN MODE)

It works really good varchars without accents, however, if I have a product called Colágeno and the search input is Colageno, the query would not catch the product.

So I tried what is showed in this SO answer without success.

SELECT *
FROM products
WHERE MATCH(CONVERT(BINARY(name) USING utf8)) AGAINST('*Colageno*' IN BOOLEAN MODE)

I got the error:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CONVERT(BINARY(name) USING utf8)) AGAINST('*Colageno*' IN BOOLEAN MODE)' at line 3

I only want to catch special characters like á, é, í, ó, ú.

Thanks in advance!

AlexSp3
  • 2,201
  • 2
  • 7
  • 24

2 Answers2

0

As the author of the answer pointed out, you can use CONVERT(BINARY(name) USING utf8) instead of CONVERT(BINARY(name) USING utf8).

MySQL already converts varchar values to binary ones when converting them to fulltext.

You need to take that into account when creating the index, otherwise you'll end up searching in binary data and not in varchar data.

Another solution is to use a certain collation for your varchar column, let's say latin1_general_ci. This way, the search will always convert the data to binary data (if the varchar column is not already a binary one) and look it up in the index as bytes.

You can read more about it here.

PJMan0300
  • 86
  • 5
  • Thanks for your answer, I tried out this query `SELECT id, name COLLATE latin1_general_ci as name, stock, price FROM products WHERE MATCH(name) AGAINST('*colageno*' IN BOOLEAN MODE);` but I'm still getting an empty result. The syntax is ok since I don't got any error, and when searching for only `cola` the product appears. What am I doing wrong? – AlexSp3 Jan 27 '22 at 23:29
  • Why did you get an empty result? You need to use the BOOLEAN MODE, as in `MATCH(name) AGAINST('*colageno*' IN BOOLEAN MODE)`. The words must be given between *round brackets*, not square brackets (this is the difference from a normal boolean search). BOOLEAN MODE is suitable for complicated queries. Just remember to put it in, and your problem should be solved! – PJMan0300 Jan 27 '22 at 23:47
  • Yes, I'm using `BOOLEAN MODE` in my query to filter irrelevant results. – AlexSp3 Jan 28 '22 at 11:54
  • Okay, so why did you get an empty result? It could be that the words are not present in the product name. Your query may return no results because your search criteria is too narrow. To fix this problem, you need to relax the conditions of your query. Let's say: `SELECT id, name COLLATE latin1_general_ci as name, stock, price FROM products WHERE MATCH(name) AGAINST('*colageno*' IN BOOLEAN MODE) AND stock > 1;` – PJMan0300 Jan 28 '22 at 15:23
  • With this query you can get a few results. But if you remove the `AND`, you can get much more results. The products with `colageno` in their name and `stock > 1` will appear. Removing the `AND` you are relaxing your query criteria. You are looking for all products that contain `colageno` in their name, no matter if it's low in stock or not. That's why the `AND` is necessary for us to narrow the results down. Now, if you want to broaden your query criteria, remove the `COLLATE latin1_general_ci` and change name into a text column instead of a varchar. This will enable a full-text search. – PJMan0300 Jan 28 '22 at 15:25
0

Solved!

After some discussion in the comments, I realized that the error was not in the MATCH() AGAINST() statement, since it does not distinguish diacritics by default.

So the problem had to do with how the diacritics were stored in MySQL, in my case, they were stored like this COLÁGENO -> COLÃ<0x81>GENO. Therefore, it was necessary to find out how to save the tildes correctly without corrupting the table.

Encodings

I tried making encoding changes by executing in phpmyadmin the instruction:

ALTER TABLE products CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

However, there were no changes in performance.

I then tried changing the encoding in the exported table file by changing DEFAULT CHARSET=latin1 to DEFAULT CHARSET=utf8mb4, but no change in results either.

Modify the accents manually

My other attempt was, once the encoding change was made, to manually modify the cells that had characters like Ã<0x81> by their corresponding character with tilde Á. But sadly this seemed to corrupt queries to the table (I was still able to access the other tables normally).

So I thought about what masterguru said in the comments about encoding changes altering the way scripts connect to the table, and apparently when I manually modified a character to put the tilde, the scripts kept accessing the table. table with the previous encoding.

Solution

The scripts were in PHP so I had to find the solution in that language.

I found this answer in English SO where it said how to save tildes in the database correctly. To do this, you had to write...

mysqli_set_charset($connection, "utf8");

...this after the connection to the database. Finally, I had to change the rare characters in my database to their corresponding tilde character for the MATCH AGAINST to work, and voila!


Many thanks to masterguru, Triby and aeportugal for the help provided in the comments!

Original post: https://es.stackoverflow.com/questions/511745/como-hacer-que-la-b%c3%basqueda-match-against-ignore-los-tildes-o-acentos

AlexSp3
  • 2,201
  • 2
  • 7
  • 24