1

The mysql query I am working on is as follow:

select line_item_product_code, line_item_usage_start_date, sum(line_item_unblended_cost) as sum
from test_indexing
force index(date)
where line_item_product_code in('AmazonEC2', 'AmazonRDS')
      and product_region='us-east-1'
      and line_item_usage_start_date between date('2019-08-01')
      and date('2019-08-31 00:00:00')
group by line_item_product_code, line_item_usage_start_date
order by sum;

I have applied indexing on the column("line_item_usage_start_date") but on running the query the indexing does not work and on explain the type is "ALL" and key is not being used. Indexing is not working only when where clause takes an "OR" or "IN" operator. The data types of columns are: line_item_product_code : TEXT line_item_unblended_cost : DOUBLE product_region : TEXT line_item_usage_start_date : TIMESTAMP My main objective for this query is : Optimizing query for fast response in the dashboard, I have this table of 192 columns and 9m+ rows with a csv size of 13+ GB. I guess indexing will solve my problem dealing with this query. Is there a alternative of these operators or any other solution for this?

user2722968
  • 13,636
  • 2
  • 46
  • 67
poindexter
  • 21
  • 3
  • Well, you have a force index hint which tells mysql that to use only the indexes that you list (date). So it will try to use that just that one. I would say: Try either: 1 Remove the force index, see what happens and try to rewrite the query to let mysql pick the right indexes itself or 2 add additional indexes in the force index listing (comma separated) or 3 Try a multi-column index. here is the doc on index hints https://dev.mysql.com/doc/refman/5.7/en/index-hints.html – camba1 Sep 27 '19 at 17:10
  • On removal of force index the query does not pick any indexes and i want to rewrite my query so it does not require force index, addition of other indexes in force index also results in taking only date query and multi-column index also is not able to auto pickup by mysql without force index. Also on force indexing I am getting the same response time as without indexing. Is there another way to speed up response time on this query? – poindexter Sep 27 '19 at 18:33
  • Without looking at execution plan, I would try: 1 using <= date and >= date instead of between ,does it pick up the index then. 2. declare two timestamp variables and use them instead of converting dates in the statement and see what that does, 3 Check what happens when you pull one product code at a time. 4. Remove the order statement and see if that helps (ordering can always be done client side and it is usually not the best use of precious DB resources anyways). Also, there are 192 (!) columns in table, so that itself is a performance killer even if you only need 4. Can that be broken up? – camba1 Sep 27 '19 at 18:50
  • 1. It doesn't pick any index for greater or less than operator. 3. On using just 1 product code indexing does not require force index but I will be needing more product code for the query. 4. Removing the order by statement is a good idea thanks for that and I can't remove the columns as for dashboard I am using just this query but for other features I would be needing that . I didn't understand the second point you mentioned about declaring 2 timestamps – poindexter Sep 27 '19 at 19:00
  • Please provide `SHOW CREATE TABLE` and `EXPLAIN SELECT...` – Rick James Oct 08 '19 at 03:55
  • @camba1 - The Optimizer converts `BETWEEN` into a pair of tests, but still sees it as a single "range" test. – Rick James Oct 08 '19 at 04:16
  • Pulling the product_codes one at a time (using UNION) would require a lot of extra stuff to handle the `GROUP BY`, `SUM`, and `ORDER BY`. Yes, it is possible, but probably not worth it. – Rick James Oct 08 '19 at 04:17
  • 2 timstamps -- I effectively debunk that in my Answer. – Rick James Oct 08 '19 at 04:19

1 Answers1

0
x = 1  OR  x = 2

is turned into this by the Optimizer:

x IN (1,2)

The use of the DATE() function is unnecessary in date('2019-08-01'). The string is fine by itself. For this:

and line_item_usage_start_date between date('2019-08-01')
                                   AND date('2019-08-31 00:00:00')

I would write this 'range':

and line_item_usage_start_date >= '2019-08-01'
and line_item_usage_start_date  < '2019-08-01' + INTERVAL 1 MONTH

You have 3 conditions in the WHERE. Build an index with

  • All the = tests, then
  • Any IN tests, then
  • At most one "range"

Hence, this may be the optimal index:

INDEX(product_region,    -- first, because of '='
      line_item_product_code,
      line_item_usage_start_date)  -- last

The EXPLAIN will probably say Using temporary, Using filesort. These are caused by the GROUP BY and ORDER BY. Still, a different index, focusing on the GROUP BY, may eliminate one sort:

INDEX(line_item_product_code, line_item_usage_start_date) -- same order as the GROUP BY

As it turns out, my first index recommendation is definitely better -- because it can do both the = and the GROUP BY.

Oops, there is a problem:

line_item_product_code : TEXT

I doubt if a "product_code" needs TEXT. Won't something like VARCHAR(30) be plenty big? The point is, that a TEXT column cannot be used in an INDEX. So also change the datatype of that column.

More cookbook: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

I have this table of 192 columns

That is rather large.

Do not use FORCE INDEX -- It may help today, but then hurt tomorrow when the data distribution changes.

Rick James
  • 135,179
  • 13
  • 127
  • 222