2

I have this query:

SELECT * FROM dwDimDate d 
LEFT JOIN tickets t FORCE INDEX FOR JOIN (idx_tickets_id_and_date) ON 
DATE_FORMAT(t.ticket_date, '%Y%m%d') = d.date_key 
LEFT JOIN sales s ON s.ticket_id = t.ticket_id
WHERE d.date_key BETWEEN 20130101 AND 20131231
GROUP BY d.date_key 

and I'm looking for help optimizing it. I've been reading everything I can on understanding the explain plan, and optimizing based off of that, but I can't seem to prevent MySQL from using an ALL type lookup on the tickets table.

INDEXES:

enter image description here

EXPLAIN PLAN:

enter image description here

I've tried using FORCE INDEX FOR JOIN to try and get it to index off of the date, but it doesn't seem to take the hint.

The dwDimDate is a date dimension with days of the year, so in this scenario I would think it would be fast to limit to 365 days, and then find all the tickets within that date range. There should only be about 5K tickets within that date span.

Any help would be greatly appreciated. I don't know how to figure out what strategy to employ to remove the "ALL" lookup. I'd like to understand how to do this in the future, so if you can help "teach me to fish", that would be great.

EDIT The query currently takes 11 seconds to run, which is going to be a problem in production.

Community
  • 1
  • 1
hyphen
  • 2,368
  • 5
  • 28
  • 59

3 Answers3

3
ON DATE_FORMAT(t.ticket_date, '%Y%m%d') = d.date_key 

This will never use an index when you use a function on the t.ticket_date column like this.

FORCE INDEX doesn't magically make non-sargable expressions into sargable expressions. It just hints the optimizer to assume a table-scan is infinitely expensive. The optimizer is therefore going to say, "well, that sucks for you, because this join expression needs to do a table-scan."

One solution would be to store t.ticket_date and d.date_key in a common format. Use either a DATE column, or a 'YYYYmmdd' string, for both.

Second possible solution: make a virtual column based on t.ticket_date and index the virtual column.

ALTER TABLE tickets 
  ADD COLUMN ticket_date_yyyymmdd AS (DATE_FORMAT(ticket_date, '%Y%m%d'),
  ADD INDEX (ticket_date_yyyymmdd);
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • perfect, thank you! That seems kind of obvious in hindsight, but I learned something. I modified my date dimension table and added a column in the format of the tickets table and now I'm down to .156 seconds! – hyphen Nov 22 '17 at 21:47
2

The problem is you're trying to join on a function of a column, not the value of the column itself. Therefore, it can't use your index on ticket_date to perform the join.

Ideally, you should ensure that ticket_date is in a format compatible with date_key so you can just do a simple comparison or range query. If that is absolutely not an option for you, and you're using a relatively recent version of MySQL (5.7.8+) with InnoDB, you can create a virtual column, and effectively create a functional index on that.

Vinay Pai
  • 7,432
  • 1
  • 17
  • 28
0

This might be closer to a valid query, and should be somewhat faster, at least on MySQL 5.6 or newer:

SELECT  *
    FROM  dwDimDate AS d
    LEFT JOIN  
        ( SELECT  MIN(ticket_id) AS one_tic_id,
                  COUNT(*) AS num_tickets,
                  DATE(ticket_date) AS date_key
            FROM  tickets t
            LEFT JOIN  sales s
               ON s.ticket_id = t.ticket_id
        ) AS ts USING (date_key)
    WHERE  d.date_key >= '2013-01-01'
      AND  d.date_key <  '2013-01-01' + INTERVAL 1 MONTH
    GROUP BY  d.date_key;
Rick James
  • 135,179
  • 13
  • 127
  • 222