-2

I have following query..

SELECT avg(h.price)
FROM `car_history_optimized` h
LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
WHERE h.date >='2015-01-01'
  AND h.date <='2015-04-01'
  AND h.dealer_id <> 2389
  AND vd.prefix IN
    (SELECT concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))
     FROM `car_history_optimized` h
     LEFT JOIN vin_data vd ON (concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)
     WHERE h.date >='2015-03-01'
       AND h.date <='2015-04-01'
       AND h.dealer_id =2389)

It finds the average market value of a car sold within last 3 months by everyone else other than (2389) but only those car which have the same Make, Model sold by (2389)

can above query be optimized ? it's taking 2 minutes to run for 11 million records..

Thanks

Mihai
  • 26,325
  • 7
  • 66
  • 81
imran
  • 169
  • 1
  • 1
  • 9
  • 2
    Put some effort into formatting your query. – Gordon Linoff Jan 05 '16 at 21:57
  • 1
    Create a new column from all that concat and join on it ,this will likely speed up your query a lot – Mihai Jan 05 '16 at 21:58
  • Maybe it's just me, but I feel like the joins on `vin_data` could be removed. You only seem to be interested in the result of the concat and you don't use any other values from `vin_data`. – Patrick Q Jan 05 '16 at 22:05

3 Answers3

1

How often will you use that particular "prefix"? If often, then I will direct you toward indexing a 'virtual' column.

Otherwise, you need

INDEX(date)             -- for the outer query
INDEX(dealer_id, date)  -- for what is now the subquery

Then do the EXISTS as suggested, or use a LEFT JOIN ... WHERE ... IS NULL.

Is date a DATE? or a DATETIME? You may be including an extra day. Suggest this pattern:

WHERE date >= '2015-01-01'
  AND date  < '2015-01-01' + INTERVAL 3 MONTH
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Date field is a "date" not datetime.. and i already have index for date but don't have index (dealer_id,date) – imran Jan 06 '16 at 07:59
0

If you want a simple solution, my initial thought is to figure out a way to not have function calls in your joins.

You negatively affect the chance that an index will be helpful.

(concat(substr(h.vin,1,8),'_',substr(h.vin,10,3))=vd.prefix)

Maybe a like statement would be a better idea, however, either approach in a join clause is to be avoided.

Bottom line is your table structure & relationships here leaves room for improvement... If you need the concat because you are avoiding joining intermediate tables, don't -- allow the indexes to be used and it should improve your query performance.

Also, make sure you have indexes.

Frank V
  • 25,141
  • 34
  • 106
  • 144
  • frank, i added new column in table in history and removed concat(substr)) functions from query and also indexed the new column as well, but no improvement..query is even slower – imran Jan 06 '16 at 08:00
0

I suggest 3 things

  1. add a column and index it (avoid the functions in the join)
  2. use an inner join
  3. use EXISTS (...) instead of IN (...)

To "optimize" that query you need to add a column to the table car_history_optimized which contains the result of concat(substr(vin,1,8),'_',substr(vin,10,3)) and this column should be indexed.

Also, use INNER JOIN. In the current query the left outer join is wasted because you require every row of that table to be IN (the subquery) so NULL from that table isn't permitted hence you have the same effect as an inner join.

Use EXISTS instead of IN

SELECT
      AVG(h.price)
FROM car_history_optimized h
      INNER JOIN vin_data vd ON h.new_column = vd.prefix
WHERE h.`date` >= '2015-01-01'
      AND h.`date` <= '2015-04-01'
      AND h.dealer_id <> 2389
      AND EXISTS (
            SELECT
                  NULL
            FROM car_history_optimized cho
            WHERE cho.`date` >= '2015-03-01'
                  AND cho.`date` <= '2015-04-01'
                  AND cho.dealer_id = 2389
                  AND vd.prefix = cho.new_column
      )
;

By the way:

  • I assume already have some indexes and those include date and dealer_id
  • in future avoid using "date" as a column name (it's a reserved word)
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • Well, thanks for the suggestion. i added new field in history table as you said, add index as well. but this query is got even slower. Actually, it never gave any result since last 5 mins, still executing.. – imran Jan 06 '16 at 07:57
  • The suggestion should improve the performance, but you do need that index on BOTH tables (vd.prefix & h.new_column). You should consider how BOTH the tables are indexed and run execution plans to observe if they are being used. – Paul Maxwell Jan 06 '16 at 10:10