mysql query using the wrong indexes

Question

I have some optimization problems with some of my queries in a mysql database. After I build my application I am trying to optimize using mysqltuner and explain, to find non indexed queries. This is a query that is running often and reports that is not using the index :

SELECT  count(*) AS rangedandselling
    FROM  
      ( SELECT  DISTINCT `store_formats`.`Store Name`
            FROM  (`eds_sales`
                    JOIN  `store_formats`
                      ON (`eds_sales`.`Store Nbr` = `store_formats`.`Store Nbr`)
                  )
            WHERE  `eds_sales`.`Prime Item Nbr` = '4'
              AND  `eds_sales`.`Date` BETWEEN CAST('2016-07-14' AS DATETIME)
                                          AND CAST('2016-07-21' AS DATETIME)
              AND  `store_formats`.`Format Name` IN ('format1','format2')
              AND  `store_formats`.`Store Name` IN (
                SELECT  DISTINCT `store_formats`.`Store Name`
                    FROM  (`eds_stock`
                            JOIN  `store_formats`
                              ON (`eds_stock`.`Store Nbr` = `store_formats`.`Store Nbr`)
                          )
                    WHERE  `eds_stock`.`Prime Item Nbr` = '4'
                      AND  `eds_stock`.`Date` BETWEEN CAST('2016-07-14' AS DATETIME)
                                                  AND CAST('2016-07-21' AS DATETIME)
                      AND  `store_formats`.`Format Name` IN ('format1','format2')
                      AND  `eds_stock`.`Curr Traited Store/Item Comb.` = '1' )
      ) t

This is the explain output : https://tools.mariadb.org/ea/pyb3h

Although I have indexed the columns involved in the joins and lookups, it looks like it is picking another index. this other index is called uniqness, and is composed of 6 different columns in the source columns that I use for inserts (the combination of those columns is the only thing that makes a row unique, hence the name I gave.). I then made sure I have indexes for the other columns and I can see them in the explain. I am not sure why this happens, can someone help?

Any ideas on optimizing this query?

Here is the explain for those that the link above does not work :

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+---+---+---+---+---+---+---+---+---+---+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 167048 |  |
| 2 | DERIVED | eds_sales | ref | uniqness,Prime Item Nbr,Store Nbr | uniqness | 4 | const | 23864 | Using where; Using index; Using temporary |
| 2 | DERIVED | store_formats | ref | Store Nbr,Store Name,Format Name | Store Nbr | 5 | equidata.eds_sales.Store Nbr | 1 | Using where |
| 2 | DERIVED | <subquery3> | eq_ref | distinct_key | distinct_key | 84 | func | 1 | Distinct |
| 3 | MATERIALIZED | store_formats | ALL | Store Nbr,Store Name,Format Name | NULL | NULL | NULL | 634 | Using where; Distinct |
| 3 | MATERIALIZED | eds_stock | ref | uniqness,Prime Item Nbr,Store Nbr | uniqness | 8 | const,equidata.store_formats.Store Nbr | 7 | Using where; Distinct |
+---+---+---+---+---+---+---+---+---+---+

I am also posting the related tables structure :

--
-- Table structure for table `eds_sales`
--
CREATE TABLE `eds_sales` (
  `id` int(12) NOT NULL,
  `Prime Item Nbr` int(12) NOT NULL,
  `Prime Item Desc` varchar(255) NOT NULL,
  `Prime Size Desc` varchar(255) NOT NULL,
  `Variety` varchar(255) NOT NULL,
  `WHPK Qty` int(5) NOT NULL,
  `SUPPK Qty` int(5) NOT NULL,
  `Depot Nbr` int(5) NOT NULL,
  `Depot Name` varchar(255) NOT NULL,
  `Store Nbr` int(5) NOT NULL,
  `Store Name` varchar(255) NOT NULL,
  `EPOS Quantity` int(5) NOT NULL,
  `EPOS Sales` float(4,2) NOT NULL,
  `Date` date NOT NULL,
  `Client` varchar(255) NOT NULL,
  `Retailer` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

ALTER TABLE `eds_sales`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `uniqness` (`Prime Item Nbr`,`Prime Item Desc`,`Prime Size Desc`,`Variety`,`WHPK Qty`,`SUPPK Qty`,`Depot Nbr`,`Depot Name`,`Store Nbr`,`Store Name`,`Date`,`Client`) USING BTREE,
  ADD KEY `Prime Item Nbr` (`Prime Item Nbr`),
  ADD KEY `Store Nbr` (`Store Nbr`);

Table structure for table eds_stock

CREATE TABLE `eds_stock` (
  `Prime Item Nbr` int(12) NOT NULL,
  `Prime Item Desc` varchar(255) NOT NULL,
  `Prime Size Desc` varchar(255) NOT NULL,
  `Variety` varchar(255) NOT NULL,
  `Curr Valid Store/Item Comb.` int(12) NOT NULL,
  `Curr Traited Store/Item Comb.` int(12) NOT NULL,
  `Store Nbr` int(12) NOT NULL,
  `Store Name` varchar(255) NOT NULL,
  `Curr Str On Hand Qty` int(12) NOT NULL,
  `Curr Str In Transit Qty` int(12) NOT NULL,
  `Curr Str On Order Qty` int(12) NOT NULL,
  `Curr Str In Depot Qty` int(12) NOT NULL,
  `Curr Instock %` int(12) NOT NULL,
  `Max Shelf Qty` int(12) NOT NULL,
  `On Hand Qty` int(12) NOT NULL,
  `Date` date NOT NULL,
  `Client` varchar(255) NOT NULL,
  `Retailer` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `eds_stock`
  ADD UNIQUE KEY `uniqness` (`Prime Item Nbr`,`Store Nbr`,`Date`,`Client`,`Retailer`),
  ADD KEY `Prime Item Nbr` (`Prime Item Nbr`),
  ADD KEY `Store Nbr` (`Store Nbr`),
  ADD KEY `Curr Valid Store/Item Comb.` (`Curr Valid Store/Item Comb.`);

Table structure for table store_formats

CREATE TABLE `store_formats` (
  `id` int(12) NOT NULL,
  `Store Nbr` int(4) DEFAULT NULL,
  `Store Name` varchar(27) DEFAULT NULL,
  `City` varchar(19) DEFAULT NULL,
  `Post Code` varchar(9) DEFAULT NULL,
  `Region #` int(2) DEFAULT NULL,
  `Region Name` varchar(10) DEFAULT NULL,
  `Distr #` int(3) DEFAULT NULL,
  `Dist Name` varchar(26) DEFAULT NULL,
  `Square Footage` varchar(7) DEFAULT NULL,
  `Format` int(1) DEFAULT NULL,
  `Format Name` varchar(23) DEFAULT NULL,
  `Store Type` varchar(20) DEFAULT NULL,
  `TV Region` varchar(12) DEFAULT NULL,
  `Pharmacy` varchar(3) DEFAULT NULL,
  `Optician` varchar(3) DEFAULT NULL,
  `Home Shopping` varchar(3) DEFAULT NULL,
  `Retailer` varchar(15) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `store_formats`
  ADD PRIMARY KEY (`id`),
  ADD KEY `Store Nbr` (`Store Nbr`),
  ADD KEY `Store Name` (`Store Name`),
  ADD KEY `Format Name` (`Format Name`);

And in your mind, why would that query have to use an index? You think that if a query uses an index, that it makes it faster? Have you thought of queries that can't use an index? Look at `SELECT COUNT(*)` that you have at the very beginning. It translates to "count everything here". In order to cout *everything*, you need to fetch *everything* and for that you don't need an index. You don't have to optimize that query at all. You can simply run it several times a day and save the count somewhere so you don't hammer the db every time you need the count. — Mjh, Jul 22 '16 at 08:54
The think is that the date range is a variable coming from php, same for an array of store formats that go into the `store_formats`.`Format Name` IN ('format1','format2') So to store the results I would need to run thousands and thousands of combinations — Ioannis Kokkinis, Jul 22 '16 at 09:32
It's not my place to tell people how to deal with their projects, but when you create database model, you usually take into account how you're going to work with the data and what the data is for. Your query is extremely ugly and difficult to read. Creating complex models and queries isn't hard - it's easy. What's hard is to create something simple that works in every scenario. The way you designed this doesn't let itself to be used or queried easily. However, as I mentioned - your original question was why MySQL doesn't use an index and the answer is - because it's not needed. — Mjh, Jul 22 '16 at 09:48
It is using an index, but picks the wrong one, that was my problem : Possible Keys : uniqness,Prime Item Nbr,Store Nbr - Used Key : uniqness. This might not be the full problem here, I understand that, but since it is flagged I would like to solve it — Ioannis Kokkinis, Jul 22 '16 at 09:54
Wait, why do you think it picks the wrong one? That implies you know something better compared to MySQL. MySQL always chooses an index that reduces the amount of `I/O` it has to perform. It's irrelevant if you have an index on a field. You can have an index on a boolean field and MySQL will almost never choose to use it since it produces more work. If you really want to satisfy your OCD and have it use the index you want - you can utilize `FORCE INDEX` while joining or filtering. However, that will not produce anything of use. Also, check your MySQL config. ~170k rows isn't really a lot. — Mjh, Jul 22 '16 at 10:06
At least for me, the link doesn't work (that's why vital information should not just be provided in external links), so I can't check which indexes you have and use, but `store_formats(Store Nbr)` and `eds_sales(Prime Item Nbr,Store Nbr)` seem to be a first good choice. `count(*)` doesn't "count everything", it counts the rows in the subquery, which can and should use indexes. Can you explain the last `IN` query (the part `store_formats.Store Name IN...`)? It looks on first sight as if you could replace your whole query by this subquery (in the `IN`) and then `join` with all `store_formats`. — Solarflare, Jul 22 '16 at 10:34
@Solarflare - yes, `SELECT COUNT(*)` **does count everything.** and OF COURSE it counts EVERYTHING from the subquery - if you really want to nitpick, then it's counting from *resultset*, not subquery because subquery is just an SQL expression that RDBMS parses in order to return the **resultset** - kind of sucks when I correct you on what you meant, doesn't it? The link works, try to access it over some free vpn service. I agree that the explain should have been posted as a part of the question. Amount of inspected rows is low, this is mostly related to innodb configuration. — Mjh, Jul 22 '16 at 10:39
@solarflare I have added the explain in the description. This query gets a list of stores and then compares them to another list of stores derived from another table. the top part is a count of this result. — Ioannis Kokkinis, Jul 22 '16 at 11:57
I believe whatever is wrong with my query or my server, (for example innodb options, mysqltuner etc, db schema) the first step would be to eliminate the joins with no indexes in my subqueries or wherever they appear, this is the main purpose of my question here. More optimisation will follow on my end obviously. The data is coming from huge excel files that I read with phpEXCEL , and I understand that the db schema can be optimized more. — Ioannis Kokkinis, Jul 22 '16 at 12:05

score 1 · Accepted Answer · answered Jul 22 '16 at 16:30

1

CAST('2016-07-14' AS DATETIME) -- the CAST is not needed; '2016-07-14' works fine. (Especially since you are comparing against a DATE.)

IN ( SELECT ... ) is inefficient. Change to a JOIN.

On eds_stock, instead of

INDEX(`Prime Item Nbr`)

have these two:

INDEX(`Prime Item Nbr`, `Date`)
INDEX(`Prime Item Nbr`, `Curr Traited Store/Item Comb.`, `Date`)

INT is always a 4-byte number, even if you say int(2). Consider switching to TINYINT UNSIGNED (and other sizes of INT).

float(4,2) -- Do not use (m,n); it causes an extra rounding and my cause undesired truncation. Either use DECIMAL(4,2) (for money), or plain FLOAT.

Bug?? Did you really want 8 days, not just a week in

AND `Date` BETWEEN CAST('2016-07-14' AS DATETIME) AND CAST('2016-07-21' AS DATETIME)

I like this pattern:

AND `Date` >= '2016-07-14'
AND `Date`  < '2016-07-14' + INTERVAL 1 WEEK

Instead of two selects

SELECT  count(*) AS rangedandselling
    FROM ( SELECT  DISTINCT `store_formats`.`Store Name` ...

One select will probably work (and be faster):

SELECT COUNT(DISTINCT `store_formats`.`Store Name`) AS rangedandselling ...

Once you have cleaned up most of that, we can get back to your question about 'wrong index', if there is still an issue. (Please start a new Question if you need further help.)

answered Jul 22 '16 at 16:30

Rick James

135,179
13
127
222

I have seen a 10-fold improvement using the above suggestions. The one that hit the spot is the creation of the new indexes. So basically the proper use of indexes is to create a combined index of all the items used in queries. So If I am querying for an id where date = something, I need an index of the id and the date. I have also modified all the structure using suggestions from PROCEDURE ANALYSE() . One question though, although there are 1.000.000 columns with 1,0 instead of boolean (or tinyint(0) it suggests ENUM('0','1') , which one is right? – Ioannis Kokkinis Jul 23 '16 at 10:07
I meant tinyint(1) – Ioannis Kokkinis Jul 23 '16 at 14:26
`TINYINT UNSIGNED` - yes. Or `ENUM('male', 'female')`, etc. "Combine all items" -- not quite; see [_Index Cookbook_](https://mariadb.com/kb/en/mariadb/building-the-best-index-for-a-given-select/) – Rick James Jul 23 '16 at 14:31
The rule is more like some "=" first, one range (eg DATE BETWEEN) last. – Rick James Jul 23 '16 at 14:32
ok just by switching the order I got even more performance and a fully indexed query now – Ioannis Kokkinis Jul 23 '16 at 15:15
1

Order in `WHERE .. AND` does not matter; order in `INDEX(a,b)` _does_ matter. – Rick James Jul 23 '16 at 16:33

mysql query using the wrong indexes

1 Answers1