-1

I can't understand the following explains:

The first one regards just PAZIENTE and ANALISI, it is ok, it is using index IDX_NOME

explain
select
paziente3_.cognome as col_8_0_
from
Analisi analisi0_
inner join
Paziente paziente3_
on analisi0_.ID_PAZIENTE=paziente3_.ID_PAZIENTE
where
paziente3_.nome like 'MARCO%';


id  select_type  table       partitions  type   possible_keys        key                  key_len  ref                                 rows  filtered  Extra
1   SIMPLE       paziente3_              range  PRIMARY,IDX_NOME     IDX_NOME             123                                          1350  100       Using where; Using index
1   SIMPLE       analisi0_               ref    FK_ANALISI_PAZIENTE  FK_ANALISI_PAZIENTE  4        gestelfolab.paziente3_.ID_PAZIENTE  1     100       Using index

If in the same query, I try to retrieve also SPECIE description, the index IDX_NOME is no more used and there is a full table scan,

explain
select
specie5_.specie as col_5_0_,
paziente3_.cognome as col_8_0_
from
Analisi analisi0_
inner join
Paziente paziente3_
on analisi0_.ID_PAZIENTE=paziente3_.ID_PAZIENTE
inner join
Specie specie5_
on paziente3_.ID_SPECIE=specie5_.ID_SPECIE
where
paziente3_.nome like 'MARCO%';



id  select_type  table       partitions  type   possible_keys                        key                  key_len  ref                                 rows    filtered  Extra
1   SIMPLE       specie5_                index  PRIMARY                              SPECIE               137                                          1       100       Using index
1   SIMPLE       paziente3_              ALL    PRIMARY,IDX_NOME,FK_PAZIENTE_SPECIE                                                                    176184  10        Range checked for each record (index map: 0x19)
1   SIMPLE       analisi0_               ref    FK_ANALISI_PAZIENTE                  FK_ANALISI_PAZIENTE  4        gestelfolab.paziente3_.ID_PAZIENTE  1       100       Using index

Is it due to an error in table definition?

CREATE TABLE SPECIE
(
ID_SPECIE TINYINT UNSIGNED NOT NULL AUTO_INCREMENT,
SPECIE VARCHAR(45) NOT NULL UNIQUE,
PRIMARY KEY (ID_SPECIE)
)
ENGINE=InnoDB;


Table   Non_unique  Key_name  Seq_in_index  Column_name  Collation  Cardinality  Sub_part  Packed  Null  Index_type  Comment  Index_comment
specie  0           PRIMARY   1             ID_SPECIE    A          0                                    BTREE                
specie  0           SPECIE    1             SPECIE       A          0                                    BTREE                



CREATE TABLE PAZIENTE
(
   ID_PAZIENTE          INT           UNSIGNED NOT NULL AUTO_INCREMENT,
   ID_PAZIENTE_LAB      VARCHAR(20),
   COGNOME              VARCHAR(40),
   NOME                 VARCHAR(40),
   DATA_NASCITA         DATE,
   ID_SESSO             TINYINT UNSIGNED NOT NULL,
   RECAPITO             VARCHAR(50),
   CODICE_FISCALE       VARCHAR(30),
   ID_SPECIE            TINYINT UNSIGNED NOT NULL,
   PRIMARY KEY (ID_PAZIENTE), 
   INDEX IDX_DATA_NASCITA (DATA_NASCITA, ID_SESSO, ID_SPECIE),
   INDEX IDX_COGNOME (COGNOME, NOME, ID_SESSO, ID_SPECIE),
   INDEX IDX_NOME (NOME, COGNOME, ID_SESSO, ID_SPECIE),
   CONSTRAINT FK_PAZIENTE_SPECIE FOREIGN KEY (ID_SPECIE) REFERENCES SPECIE(ID_SPECIE),  
   CONSTRAINT FK_PAZIENTE_SESSO FOREIGN KEY (ID_SESSO) REFERENCES SESSO(ID_SESSO)   
)
ENGINE=InnoDB;




Table     Non_unique  Key_name            Seq_in_index  Column_name   Collation  Cardinality  Sub_part  Packed  Null  Index_type  Comment  Index_comment
paziente  0           PRIMARY             1             ID_PAZIENTE   A          176176                               BTREE
paziente  1           IDX_DATA_NASCITA    1             DATA_NASCITA  A          15594                          YES   BTREE
paziente  1           IDX_DATA_NASCITA    2             ID_SESSO      A          25007                                BTREE
paziente  1           IDX_DATA_NASCITA    3             ID_SPECIE     A          17922                                BTREE
paziente  1           IDX_COGNOME         1             COGNOME       A          62479                          YES   BTREE
paziente  1           IDX_COGNOME         2             NOME          A          163074                         YES   BTREE
paziente  1           IDX_COGNOME         3             ID_SESSO      A          170270                               BTREE
paziente  1           IDX_COGNOME         4             ID_SPECIE     A          154908                               BTREE
paziente  1           IDX_NOME            1             NOME          A          55289                          YES   BTREE
paziente  1           IDX_NOME            2             COGNOME       A          166062                         YES   BTREE
paziente  1           IDX_NOME            3             ID_SESSO      A          176184                               BTREE
paziente  1           IDX_NOME            4             ID_SPECIE     A          176184                               BTREE
paziente  1           FK_PAZIENTE_SPECIE  1             ID_SPECIE     A          1                                    BTREE
paziente  1           FK_PAZIENTE_SESSO   1             ID_SESSO      A          1                                    BTREE

If I force the new IDX_NOME:

    explain
    select
    specie5_.specie as col_5_0_,
    paziente3_.cognome as col_8_0_
    from
    Analisi analisi0_     
    inner join
    Paziente paziente3_     FORCE INDEX (IDX_NOME) 
    on analisi0_.ID_PAZIENTE=paziente3_.ID_PAZIENTE 
    inner join
    Specie specie5_
    on paziente3_.ID_SPECIE=specie5_.ID_SPECIE
    where
    paziente3_.nome like 'MARCO%'
;

the result is the following:

id  select_type  table       partitions  type   possible_keys        key                  key_len  ref                                 rows  filtered  Extra
1   SIMPLE       specie5_                index  PRIMARY              SPECIE               137                                          1     100       Using index
1   SIMPLE       paziente3_              range  IDX_NOME             IDX_NOME             123                                          1350  100       Using where; Using index; Using join buffer (Block Nested Loop)
1   SIMPLE       analisi0_               ref    FK_ANALISI_PAZIENTE  FK_ANALISI_PAZIENTE  4        gestelfolab.paziente3_.ID_PAZIENTE  1     100       Using index

Maybe we can say that the FULL TABLE SCAN is not a problem in this case?

Nisse Engström
  • 4,738
  • 23
  • 27
  • 42
Etantonio
  • 359
  • 1
  • 5
  • 13
  • Learn from your previous posts how to format a post – juergen d May 14 '16 at 10:52
  • a) Edit your question, don't add an answer. b) Add show index for ALL relevant tables (species seems problematic) c) I suggested an index `IDX_NOME (NOME, ID_SPECIE)` (not more) or vice versa, but that doesn't seem to be the problem. d) Your stats look fine, but they say that every `ID_species` has the same value in `paziente`. Is that correct? e) For some reason mysql thinks accessing the species-table directly is too expensive. f) Try: 1. add `;` after `like 'MARCO%';`. 2. Try `force index ... ` as described in my answer. 3. `ANALYZE TABLE paziente, species` 4. What is your mysql version? – Solarflare May 15 '16 at 13:16
  • Thanks again for your time, – Etantonio May 15 '16 at 13:34
  • Thanks again for your time, I revised my first question. the index is bigger because I'm looking for my final configuration and I need almost one night to load the data. In my real application I can search for COGNOME, NOME, ID_SESSO and ID_SPECIE, as you describe just an index could be used so I need to put them all in the index, it is right? Yes you're right, in this test I've just a value for ID_SPECIE in table PAZIENTE but I can have about 20 different types. Mysql is latest version 5.7. I also added the force index result – Etantonio May 15 '16 at 13:48
  • `range` means mysql is using that index correctly now (its not a full table scan, it is just what you expect). Can you add the time it takes for the query with and without force and the first one without species? I still don't know why mysql has to be forced. Can you try a different name with less results (and without force then). Did you try `ANALYZE TABLES...` (after import)? Try to forbid the specie-index, by `join Specie specie5_ ignore index (specie)` and test that too. And a remark: you don't have to reimport your data to change your index, use `create index ...` and `drop index ...`! – Solarflare May 16 '16 at 23:21
  • And another remark: if you want to search for `cogname` (and not name), you have to have this index in the force-list too, otherwise it would be nearly a table scan to use the (forced) `idx_nome`-index for that. – Solarflare May 16 '16 at 23:23
  • Hi, on my system (Intel I5 4GByte RAM) add or modify an index requires big amount of time without answer from the system, after 5 hour I prefer to kill the requestand recreate the database and reload the data. In any case I saw that the query with force is a little bit slower so maybe mysql choose the right solution. In any case I choose to not use force index in production because I'm working with spring-data-jpa and querydsl to abstract from database implementation layer. – Etantonio May 17 '16 at 19:34
  • Regarding index I think I will mantain this configuration INDEX IDX_DATA_NASCITA (DATA_NASCITA, ID_SESSO, ID_SPECIE), INDEX IDX_COGNOME (COGNOME, NOME, ID_SESSO, ID_SPECIE), INDEX IDX_NOME (NOME, COGNOME, ID_SESSO, ID_SPECIE), thanks for your help – Etantonio May 17 '16 at 19:34
  • @Etantonio, there seems to be something wrong with your configuration, creating an index for 176176 rows should be a matter of some seconds or, worst case, some minutes (and importing with bulk import should just be a matter of minutes, too). If you have any chance, try a clean install on a different system (or maybe reinstall it if you are not using it for something else, and maybe try a different version), your mysql seems to be acting very strange in many regards (or your hard drive is very slow or broken, maybe check that too). – Solarflare May 17 '16 at 20:25

1 Answers1

0

When asking anything about why mysql uses a specific indexes (or not), please always provide the output of "show index on tablename" for your table(s). The information given there is used by mysql to decides what index to use. And to understand why mysql uses an index or not, it helps to have the information mysql uses to decide which index to use or not.

And that information might be out of date, so you might want to do an ANALYZE TABLE ANALISI, PAZIENTE, SPECIE first and check your explain again (it will lock the table until it is done, just in case you are using this on a production server).

Like is not a strong indicator to use a specific index, because it could happen that you use a query like '%Marco%. If that can happen, you might as well stop reading now, as your index can't be used in general (but you might want to try a fulltext index).

To do the join and where, you need information about ID_SPECIE and NOME in your PATIENZE-table. Unfortunately, you don't have an index that has both information at the same time.

It's hard to know for sure without the SHOW INDEX-data, but to get the result, it might be a good idea to look up the specie information entries in the FK_PAZIENTE_SPECIE, so you don't have to look in that actual table; on the other hand, you can check in IDX_NOME-index for entries that have the wanted name. This will give you 2 lists. Now you have to check which entries are on both lists. Think about how you have to do this: take an entry of list 1 (it contains the primary key) and then scroll through list 2 to see if the primary key is there too. This might get faster if you sort list 2 first by the primary key.

MySQL can't use two indexes at the same time for the same table (because of what "using an index" technically means), but it can of course do a similar thing as desribed above and still use the indexes to get the required information. That is what Range checked for each record (index map: 0x19) means: it combines information from several indexes (here: the 1st, 4th and 5th index in your SHOW INDEX FROM PAZIENTE-table, which should be, if I'm not mistaken, PRIMARY, IDX_NOME and FK_PAZIENTE_SPECIE), that each contain some of the information needed. Because there is no direct link between IDX_NOME and FK_PAZIENTE_SPECIE (they are indirectly linked via PRIMARY), that can be fast or slow, depending on your data (so it helps if the ANALYZE-information is up-to-date to help mysql to decide what exactly to do - mysql will not change the strategy after it has chosen one, even if it turns out to be a bad one during execution).

So your indexes are actually used, but not in the fastest way possible.

What to do about it? If you followed until here, you might already know: "To do the join and where, you need information about ID_SPECIE and NOME in your PATIENZE-table." So just give it to mysql: create an index with both columns. It depends on your data which order is better, and if you want to use queries with '%MARCO%you should putID_SPECIE` first since the index can't be used then otherwise.

Try it out, this should make mysql use the new index now. If not (and no other index either), that might be a good follow up question, but then please really add the show index-information.

You can of course always force a key by ... join Paziente paziente3_ force index (idx_PAZIENTE_nome_specie) ... and check if it really is faster or if mysql was actually right not to use it. (But then it will try to use that index even if you search for e.g. '%MARCO%', which essentially means no index at all, even when it might be better to just use FK_PAZIENTE_SPECIE then, so you have to add this one too, and so on. So use the force with care.)

And it might be worth a try for testing and comparison, if ... join Paziente paziente3_ force index (idx_nome) ... now (before you add another index and before you do ANALYZE TABLES), will do the trick. At first glance, I'd assume that this should already make it faster, but maybe mysql was already correct not to use the index.

Solarflare
  • 10,721
  • 2
  • 18
  • 35