1

This query returns rows ordered by relevance when searching for a species name. I use it for an autocomplete suggestion list and the relevance calculation works fine, but the query is a bit slow on a large table, and I’m grateful for any tips on how to optimize it (MySQL). My main question is:

  • Can I create any type of index on the table that would help optimization? Or am I stuck with using this type of query that aparently uses filesort algorithm? (probably the reason for beeing a bit slow?)
  • EDIT: I use InnoDB as table type, so unfortunately I can not use fulltext indexing in this case (only works with MyIsam tables).

    SQL-fiddle here: http://sqlfiddle.com/#!2/f03c4c/5

    SELECT QUERY:

    SET @search ='Boletus a';
    
    SELECT id, genus, species, fullname, 
        (CASE WHEN (CONCAT(genus, ' ', species)=@search) THEN 1 ELSE 0 END) # EXACT MATCH OF WHOLE NAME
      + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF WHOLE NAME
      + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT('%',@search,'%')) THEN 1 ELSE 0 END) # LIKE MATCH OF WHOLE NAME                             
      + (CASE WHEN (genus=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF genus
      + (CASE WHEN (species=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF species             
      + (CASE WHEN (genus LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF genus
      + (CASE WHEN (species LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) #MATCH BEGINNING OF species
             AS relevans
             FROM species 
             WHERE `fullname` LIKE CONCAT('%',@search,'%')
             ORDER BY relevans DESC, genus, species
             LIMIT 50;
    

    Background: A species name consists of at least two parts, genus and epithet (in my table the epithet column is named ”species”). I have three columns in the table: genus, species and fullname. The column ”fullname” can also contain names of lower taxa (varieties and forms as in the sqlfiddle example). I am open for any suggestions on how to make the search more efficient. Maybe a regexp on the search string and target only the column ”fullname” instead of concatenating two columns?

    DATABASE SCHEMA EXAMPLE:

    CREATE TABLE species
        (`id` int, `genus` varchar(50), `species` varchar(50), `fullname` varchar(100))
    ;
    
    INSERT INTO species
        (`id`, `genus`, `species`, `fullname`)
    VALUES
        (360052, 'Afroboletus', 'azureotinctus', 'Afroboletus azureotinctus'),
        (360053, 'Afroboletus', 'costatisporus', 'Afroboletus costatisporus'),
        (464267, 'Afroboletus', 'elegans', 'Afroboletus elegans'),
        (360054, 'Afroboletus', 'lepidellus', 'Afroboletus lepidellus'),
        (112100, 'Afroboletus', 'luteolus', 'Afroboletus luteolus'),
        (464266, 'Afroboletus', 'multijugus', 'Afroboletus multijugus'),
        (112101, 'Afroboletus', 'pterosporus', 'Afroboletus pterosporus'),
        (326826, 'Aureoboletus', 'auriporus', 'Aureoboletus auriporus'),
        (326828, 'Aureoboletus', 'gentilis', 'Aureoboletus gentilis'),
        (309389, 'Aureoboletus', 'novoguineensis', 'Aureoboletus novoguineensis'),
        (326829, 'Aureoboletus', 'subacidus', 'Aureoboletus subacidus'),
        (113146, 'Aureoboletus', 'thibetanus', 'Aureoboletus thibetanus'),
        (118425, 'Austroboletus', 'cookei', 'Austroboletus cookei'),
        (118427, 'Austroboletus', 'dictyotus', 'Austroboletus dictyotus'),
        (412550, 'Austroboletus', 'lacunosus', 'Austroboletus lacunosus'),
        (159051, 'Boletus', 'aereus', 'Boletus aereus'),
        (171640, 'Boletus', 'appendiculatus', 'Boletus appendiculatus'),
        (161237, 'Boletus', 'armeniacus', 'Boletus armeniacus'),
        (563944, 'Boletus', 'australiensis', 'Boletus australiensis'),
        (444094, 'Boletus', 'badius', 'Boletus badius'),
        (215376, 'Boletus', 'brunneus', 'Boletus brunneus'),
        (129701, 'Boletus', 'bubalinus', 'Boletus bubalinus'),
        (203954, 'Boletus', 'byssinus', 'Boletus byssinus'),
        (162779, 'Boletus', 'calopus', 'Boletus calopus'),
        (129469, 'Boletus', 'caucasicus', 'Boletus caucasicus'),
        (208740, 'Boletus', 'chrysenteron', 'Boletus chrysenteron'),
        (486540, 'Boletus', 'cisalpinus', 'Boletus cisalpinus'),
        (368037, 'Boletus', 'declivitatum', 'Boletus declivitatum'),
        (104061, 'Boletus', 'depilatus', 'Boletus depilatus'),
        (356530, 'Boletus', 'edulis', 'Boletus edulis'),
        (356278, 'Boletus', 'erythropus', 'Boletus erythropus var. immutatus'),
        (417068, 'Boletus', 'erythropus', 'Boletus erythropus var. erythropus'),
        (563943, 'Boletus', 'eximius', 'Boletus eximius'),
        (264716, 'Boletus', 'fechtneri', 'Boletus fechtneri'),
        (372473, 'Boletus', 'ferrugineus', 'Boletus ferrugineus'),
        (141943, 'Boletus', 'flavus', 'Boletus flavus'),
        (247434, 'Boletus', 'fragrans', 'Boletus fragrans'),
        (302971, 'Boletus', 'fuligineus', 'Boletus fuligineus'),
        (218213, 'Boletus', 'impolitus', 'Boletus impolitus'),
        (327048, 'Boletus', 'legaliae', 'Boletus legaliae'),
        (327051, 'Boletus', 'leptospermi', 'Boletus leptospermi'),
        (235486, 'Boletus', 'lignatilis', 'Boletus lignatilis'),
        (354822, 'Boletus', 'luridiformis', 'Boletus luridiformis var. junquilleus'),
        (354845, 'Boletus', 'luridiformis', 'Boletus luridiformis var. discolor'),
        (430254, 'Boletus', 'luridiformis', 'Boletus luridiformis var. luridiformis'),
        (132915, 'Boletus', 'luridus', 'Boletus luridus var. rubriceps'),
        (417113, 'Boletus', 'luridus', 'Boletus luridus var. luridus'),
        (241417, 'Boletus', 'megalosporus', 'Boletus megalosporus'),
        (282394, 'Boletus', 'moravicus', 'Boletus moravicus'),
        (196024, 'Boletus', 'paluster', 'Boletus paluster')
    ;
    
    Michael Krikorev
    • 2,126
    • 1
    • 18
    • 25
    • Great query, can't see anything to improve: perhaps you can get some ideas from a relevance query I wrote: http://stackoverflow.com/questions/27686084/raw-relevance-query-in-laravel-how-to-handle-it – Edwin Krause Mar 02 '15 at 11:05
    • @EdwinKrause No, this is not a good way to approach this. Have a look at fulltext indexes (link in my answer). – fancyPants Mar 02 '15 at 11:19
    • Interesting approach your answer... but you won't get the relevancy the way MicKri needs it. Or am I misreading your query? – Edwin Krause Mar 02 '15 at 11:41
    • How does he need it and why? He needs it for ordering, right? Does it make any difference if you order by 1,2,3,4 or by 0.1, 0.2, 0.3, 0.4? – fancyPants Mar 02 '15 at 12:27
    • Thanks both for your input. Unfortunately fulltext indexing only works with MyIsam tables, I'm using InnoDB for a number of reasons, but will consider refractoring. – Michael Krikorev Mar 02 '15 at 20:49

    1 Answers1

    2

    My advice, forget this query, create a fulltext index.

    Create the index covering the columns genus, species and fullname (all in one index). Then query like this:

    SELECT * FROM your_table WHERE MATCH(genus, species, fullname) AGAINST ('Boletus a');
    

    You can also use the MATCH(genus, species, fullname) AGAINST ('Boletus a') in other parts of the query:

    SELECT MATCH(genus, species, fullname) AGAINST ('Boletus a') #displays relevancy (a value between 0 and 1)
    FROM your_table 
    WHERE 
    MATCH(genus, species, fullname) AGAINST ('Boletus a') #filters (obviously)
    ORDER BY MATCH(genus, species, fullname) AGAINST ('Boletus a') #also obvious, orders by relevancy
    ;
    
    fancyPants
    • 50,732
    • 33
    • 89
    • 96
    • Thanks for valuable input. Unfortunately the current table type is InnoDB due to relation constraints and other reasons. Otherwise I would have gone with fulltext indexing. If not any other better suggestions shows up the next few days on how to optimize my query I will mark your answer as the best solution... – Michael Krikorev Mar 02 '15 at 20:43
    • @MicKri You also have the option to update MySQL to version 5.6 Then you can use fulltext indexes with InnoDB, too. – fancyPants Mar 02 '15 at 20:54
    • @fancyPants great answer, Although at the moment there are only about 5000 entries in my earlier mentioned relevance query, I will certainly get back to this answer in future to optimize my query when necessary and/or time allows. – Edwin Krause Mar 03 '15 at 05:38
    • Thank you for input... Did not know that fulltext indexing now are supported by MySQL >=5.6, great news! – Michael Krikorev Mar 04 '15 at 15:02