3

I know this question has been asked several times.. but , let me explain.

I have a table with 450k records of users (id, first name, last name, address, phone number, etc ..). I want to search users by thei first name and/or their last name.

I used these queries :

SELECT * FROM correspondants WHERE nom LIKE 'Renault%' AND prénom LIKE 'r%';

and

SELECT * FROM correspondants WHERE CONCAT(nom, CHAR(32), prénom= LIKE 'Renault r%';

It works well, but with a too high duration (1,5 s). This is my problem.

To fix it, I tried with MATCH and AGAINST with a full text index on both colums 'nom' and 'prénom' :

SELECT * FROM correspondants WHERE MATCH(nom, prénom) AGAINST('Renault r');

It's very quick (0,000 s ..) but result is bad, I don't obtain what I should have.

For example, with LIKE function, results are :

88623   RENAULT Rémy
91736   RENAULT Robin
202269  RENAULT Régine

(3 results).

And with MATCH/AGAINST :

327380  RENAULT Luc
1559    RENAULT Marina
17280   RENAULT Anne
(...)
88623   RENAULT Rémy
91736   RENAULT Robin
202269  RENAULT Régine
(...)
436696  SEZNEC-RENAULT  Helene
(...)

(115 results !)

What is the best way to do a quick and efficient text search on both columns with a "AND" search ? (and what about indexes)

Jerry
  • 1,141
  • 1
  • 13
  • 18
  • 1
    What do you mean by "result is bad"? – santamanno Aug 25 '19 at 15:05
  • 2
    use [BOOLEAN](https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html) search mode? -> `AGAINST('+Renault +r*' IN BOOLEAN MODE)` ?? ... Also keep in mind that to small words or stopwords might not be indexed by the full text indexes.. – Raymond Nijland Aug 25 '19 at 15:10
  • I have edited my post. Raymond, in boolean mode with "AGAINST('+Renault +r*' IN BOOLEAN MODE)", same results (115 results). – Jerry Aug 25 '19 at 16:18
  • Raymond yes, you are right, I didn't know... one letter is useless; need 3 letters to have the good results ! – Jerry Aug 25 '19 at 16:28
  • Works very well in BOOLEAN mode and with 2 conditions (one for the first name alone, the other for the last name alone). You can reply, I will vote for you. – Jerry Aug 25 '19 at 16:38

2 Answers2

2

Fulltext search doesn't do pattern-matching as LIKE string comparisons do. Fulltext search only searches for full words, not fragments like r%.

Also there's a minimum size of word, controlled by the ft_min_word_len configuration variable. To avoid making the fulltext index too large, it doesn't index words smaller than that variable. And therefore short words are ignored when you search, so r is ignored.

There's also no choice in fulltext indexing to search for words in a specific position like at the beginning of a string. So your search for renault may be found in the middle of the string.

To solve these issues, you could do the following:

SELECT * FROM correspondants WHERE MATCH(nom, prénom) AGAINST('Renault')
  AND CONCAT(nom, CHAR(32), prénom) LIKE 'Renault r%';

This would use the fulltext index to find a small subset of your 450,000 rows that have the word renault somewhere in the string. Then the second term in the search would be done without help from an index, but only against the subset of rows that match the first term.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

That particular query is best done this way:

INDEX(nom, prénom)

WHERE non = 'Relault' AND prénom LIKE 'R%'

I recommend that you add that index and add code to your application to handle different requests in different ways.

Do not hide an indexed column inside a function call, such as CONCAT(nom, ...), it will not be able to use the index; instead it will check every row, performing the CONCAT for every row and then doing the LIKE. Very slow.

Except for cases of initials (as above), you should mostly avoid very short names. However, here is another case where you can make it work with extra code:

WHERE nom = 'Lu'

(with the same index). Note that using any flavor of MATCH is likely to be much less efficient.

So, if you are given a full last name, use WHERE nom =. If you are given a prefix, then it might work to use WHERE nom LIKE 'Prefix%' Etc.

FULLTEXT is best used for cases where you have full words scattered in longer text, which is not your case since you have nom and prénom split out.

Perhaps you should not use MATCH for anything in this schema.

Rick James
  • 135,179
  • 13
  • 127
  • 222