2

The below query takes several minutes(never completes sometimes) to execute. I'm using MySQL database.

select 
        customer_name as cust,
        SUM(num_visits) AS visits
     from 
        visit_history
     where
            category = "middleMan"
            and eve_date >= '2014-07-01' and eve_date <= '2015-07-01
            and eve_type='XCG'
            and eve_master IN (select eve_name from master_type_ahj where category = "middleMan"  and eve_date >= '2014-07-01' and eve_date <= '2015-07-01')
     group by
        cust
    order by
        visits  desc 
    limit   
        50

The database table contains over million records. The data is partitioned. If I were to remove the subquery - and eve_master IN (select eve_name from master_type_ahj where category = "middleMan" and eve_date >= '2014-07-01' and eve_date <= '2015-07-01'), the query completes in a few min.

The columns that go into WHERE clause are indexed.

Is there anyway I could tune this? Any help here would help me tune a bunch of related queries too. Could I ask for help here please?

usert4jju7
  • 1,653
  • 3
  • 27
  • 59
  • Index the columns you search by, and consider using BETWEEN() for date comparisions, and/or unix timestamp conversions. – Alfie Apr 16 '16 at 03:26

3 Answers3

1

MySQL often handles EXISTS better than IN. So, this is your query:

select customer_name as cust, SUM(num_visits) AS visits
from visit_history
where category = 'middleMan' and
      eve_date >= '2014-07-01' and eve_date <= '2015-07-01 and
      eve_type = 'XCG' and
      exists (select 1
              from master_type_ahj m2
              where m2.eve_master = vh.eve_master
                    m2.category = 'middleMan'  and
                    m2.eve_date >= '2014-07-01' and eve_date <= '2015-07-01'
             )
group by cust
order by visits  desc 
limit 50;

I would recommend indexes on visit_history(category, eve_type, eve_date, eve_master). and master_type_ahj(eve_master, category, eve_date).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you Gordon. I guess you meant to say "MySQL often handles EXISTS better than IN. – usert4jju7 Apr 16 '16 at 03:30
  • Thank you very much Gordon. The query was way faster than that in my question. The derived tables concept posted below works super fast & hence, I accepted the answer. Thank you again for offering help – usert4jju7 Apr 16 '16 at 04:37
1

I would suggest using an actual join would be more efficient:

select 
      vh.customer_name as cust,
      SUM(vh.num_visits) AS visits
    from 
      visit_history as vh
      inner join 
      master_type_ahj as mt on vh.eve_master = mt.eve_name
    where
      vh.category = "middleMan"
      and vh.eve_date >= '2014-07-01' and vh.eve_date <= '2015-07-01
      and vh.eve_type='XCG'
      and mt.category = "middleMan"  
      and mt.eve_date >= '2014-07-01' and mt.eve_date <= '2015-07-01'

You may also learn more by using explain to see what mysql is actually doing to handle your query.

user212514
  • 3,110
  • 1
  • 15
  • 11
1

Here's another way using a derived table

select 
    customer_name as cust,
    sum(num_visits) as visits
from visit_history
join (
    select distinct eve_name from master_type_ahj 
    where category = "middleMan"  
    and eve_date >= '2014-07-01' 
    and eve_date <= '2015-07-01'
) t on t.eve_name = visit_history.eve_master
where category = "middleMan"
    and eve_date >= '2014-07-01' and eve_date <= '2015-07-01'
    and eve_type='XCG'
group by cust
order by visits desc 
limit 50
FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
  • 1
    Thank you very much. Where the query never returned in about 30 min or more & still didn't complete , now 2 years worth of data comes back in about 4 sec :) I honestly can't thank you enough. – usert4jju7 Apr 16 '16 at 04:35
  • @usert4jju7 glad it helped – FuzzyTree Apr 16 '16 at 04:36
  • Also with respect to one of the comments above, is using `BETWEEN` efficient(faster) than using greater than & less than criteria? Would you be able to please clarify? – usert4jju7 Apr 16 '16 at 04:36
  • @usert4jju7 I don't believe using between is faster than using inequality operators. But the best way to check is to test it yourself – FuzzyTree Apr 16 '16 at 04:39
  • Sure. Thank you :) – usert4jju7 Apr 16 '16 at 05:16