1

MySQL is providing different execution plans for a query depending on whether I check that an indexed TINYINT field 'is false' or '= 0'. The table's name is ordini (means 'shipments') and the index that I want MySQL to use is shipmentslistrequest

SHOW CREATE ordini (I omitted most columns and indexes, for simplicity):

CREATE TABLE `ordini` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dataIns` datetime DEFAULT NULL,
  `hasLDV` tinyint(1) NOT NULL DEFAULT '0',
  `isAnnullato` tinyint(1) NOT NULL DEFAULT '0',
  `isEsportatoSAM` tinyint(1) DEFAULT '0',
  `id_azienda_aux` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `dataIns` (`dataIns`),
  KEY `id_azienda_aux` (`id_azienda_aux`),
  KEY `shipmentslistrequest` (`id_azienda_aux`,`isEsportatoSAM`,
                              `hasLDV`,`isAnnullato`,`dataIns`)
) ENGINE=InnoDB AUTO_INCREMENT=5007359 DEFAULT CHARSET=latin1

Query1:

EXPLAIN  select  *
        from  ordini
        where  id_azienda_aux = 92
          and  isEsportatoSAM = 0
          and  isAnnullato = 0
          and  hasLDV = 1
          and  dataIns >= '2020-04-28'
          and  dataIns <  '2020-05-19';

id|select_type|table |type |possible_keys                              |key                 |key_len|ref|rows|Extra      |
--|-----------|------|-----|-------------------------------------------|--------------------|-------|---|----|-----------|
 1|SIMPLE     |ordini|range|dataIns,id_azienda_aux,shipmentslistrequest|shipmentslistrequest|17     |   | 138|Using where|

Query2 (change: isEsportatoSAM is false):

EXPLAIN   select  *
        from  ordini
        where  id_azienda_aux = 92
          and  isEsportatoSAM is false
          and  isAnnullato = 0
          and  hasLDV = 1
          and  dataIns >= '2020-04-28'
          and  dataIns <  '2020-05-19';

id|select_type|table |type |possible_keys                              |key    |key_len|ref|rows  |Extra      |
--|-----------|------|-----|-------------------------------------------|-------|-------|---|------|-----------|
 1|SIMPLE     |ordini|range|dataIns,id_azienda_aux,shipmentslistrequest|dataIns|9      |   |205920|Using where|

Query3 (change: isEsportatoSAM = 0 and isAnnullato is false):

EXPLAIN     select  *
        from  ordini
        where  id_azienda_aux = 92
          and  isEsportatoSAM = 0
          and  isAnnullato is false
          and  hasLDV = 1
          and  dataIns >= '2020-04-28'
          and  dataIns <  '2020-05-19';

id|select_type|table |type|possible_keys                              |key                 |key_len|ref              |rows|Extra      |
--|-----------|------|----|-------------------------------------------|--------------------|-------|-----------------|----|-----------|
 1|SIMPLE     |ordini|ref |dataIns,id_azienda_aux,shipmentslistrequest|shipmentslistrequest|7      |const,const,const| 206|Using where|

My guess is that:

Query1 will properly take advantage of the shipmentslistrequest index.

Query2 can't use it beyond the first column because I am comparing isEsportatoSAM to false (but why?. Does the fact that the column admits nulls have any relevance?), and hence chooses a different index.

Query3 will use the index up to (and including) the third column, but no further because of the 'is false' comparison, and that's why the three 'const' in the ref column, and that's why the join type is 'ref' instead of 'range' (it doesn't get to the date range part of the index). Is this interpretation correct?

Can anyone please solve these questions for me?

PS: The MySQL version is 5.5.56

PS2: From a design point of view, I am aware that it makes no sense to allow nulls in those tinyint columns (I didn't design this system).

javier
  • 62
  • 7
  • I can't reproduce the problem, but I don't have any data. Two suggestions (1) 5.5 is quite old; upgrade. (2) DROP KEY `id_azienda_aux`; it is unnecessary because of the composite index. (I have seen cases where the Optimizer uses a smaller index in spite of the larger one being "better".) – Rick James May 20 '20 at 03:03
  • @RickJames I will remove the redundant index, thanks. One extra question: adding a new comparison to `WHERE` in Query1 shouldn't be a problem, right? I'm asking because the manual states that *MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index*, and I find this wording confusing. Adding a new column to my query would make it "not a leftmost prefix of the index", but the index would still be used, right? So it would be better to say that the query must *contain* *some* leftmost prefix of the columns in the index, correct? – javier May 20 '20 at 13:58
  • These have examples that may help clarify the "leftmost prefix" confusion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql and http://mysql.rjweb.org/doc.php/index1 – Rick James May 20 '20 at 15:08
  • I have been unable to reproduce the situation, either on 5.5 or later versions. I would hope that `IS FALSE` is equivalent to `= 0`, but I don't have proof or disproof. – Rick James May 20 '20 at 21:03

2 Answers2

1

This is just a guess, but I suspect it's because IS boolean_value is designed to convert any datatype to its boolean value before testing it. The query optimizer should be able to tell that IS FALSE is equivalent to = 0 when the column is any kind of INT, and then use the index as in the first query, but apparently MySQL doesn't implement that optimization.

If you want great query optimization, MySQL isn't generally the DB of choice.

Barmar
  • 741,623
  • 53
  • 500
  • 612
1

isAnnullato = 0 lets the optimizer use an index.

isAnnullato is false requires the query to look at every row and evaluate isAnnullato in a boolean context.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • `FALSE` is represented as `0`, so I am dubious of this answer. (`TRUE`, on the other hand, is any non-zero value, so that would be a different situation.) – Rick James May 20 '20 at 03:08