0

I'm working on optimizing the below query (takes 1:15 minutes now), but I better explain first what I'm doing (long):

I'm selecting (Developer Express) calendar events in a period tt_actualstart - tt_actualfinish for an employee identified by tt_emp_id.
These events are characterized by tt_type values:

0: single event<br />
1: master recurring event<br />
other: exceptions on the recurring events.
  (These have a `tt_parentid` referencing `tt_calendar_id` of the master event)

The 4 'OR'-ed query parts are:

  1. Single and master events in the period for the employee
  2. All exceptions of which the master meets the requirements of 1
  3. Exceptions in the period with master events (anywhere) belonging to the employee
  4. The master events belonging to the exceptions found in 3.
    This query part is the one taking most time.

Schematically:

Legend:   |-------------------| = selection period
          CAPITALS = selected event
          lowercase = related event

1.                               MASTER
                          |-------------------|

2.        EXCEPTION              master
                          |-------------------|

3.        master                EXCEPTION
                          |-------------------|

4.        MASTER                exception                  
                          |-------------------|

There is one complication that influenced these queries: I cannot select on exceptions having the required employee ID. This is because the Developer Express code does not store appointment properties of exceptions when these are equal to the properties of the master parent. Selection on employee ID can therefore only be done on the master parent.

As you can see, the second OR-ed query uses the first one as subquery; likewise for 4 and 3.

select c.*
from tt_calendar c
where 
( 
   ( 
      (c.tt_type in (0,1)) and (c.tt_actualstart <= '2014-03-31') and (c.tt_actualfinish >= '2014-01-01') and (c.tt_emp_id= 20652)
   )

     or 

   (  
      (c.tt_type > 1) and (c.tt_parentid is not null) and  
      (c.tt_parentid in (select c2.tt_calendar_id from tt_calendar c2
                         where (c2.tt_calendar_id=c.tt_parentid) and (c2.tt_actualstart <= '2014-03-31') and (c2.tt_actualfinish >=  '2014-01-01') and (c2.tt_emp_id=20652)
                         )
      )
   )

     or

   (  
      (c.tt_type > 1) and (c.tt_parentid is not null) and  
      (c.tt_actualstart <= '2014-12-31') and (c.tt_actualfinish >= '2014-01-01') and
      (c.tt_parentid in (select c2.tt_calendar_id from tt_calendar c2 where c2.tt_emp_id=20652))
   )

     or

   (  
      (c.tt_type = 1) and
      (c.tt_calendar_id in 
        (select tt_parentid from tt_calendar c2 
         where (c2.tt_type > 1) and (c2.tt_parentid is not null) and  
               (c2.tt_actualstart <=  '2014-03-31') and (c2.tt_actualfinish >= '2014-01-01') and (c2.tt_emp_id=20652)))
   )
)

Adding indices on tt_actualstart and tt_actualfinish gave a 15% speed increase. The (FireBird) query plan shows that these indices are indeed being used.
There are no other indices except for the primary key tt_calendar_id.

One other thing I have tried is using an integer TAG field which gets set by 4 individual UPDATE queries, then selecting the TAGged records.
This actually increases execution time a little.

I'm not sure converting the IN to EXISTS is useful - does Firebird/SQL/Oracle (I need it for all three) already optimize that?

Any more idea on how to improve this query?

Added 17 feb 2014:

There is one outlier case in my test data that all answers so far miss, and that my (heavy) 4th query finds:

(not found) Master ID 27274 for recurring event that has only two occurrences: 10 Dec 2013 - 11 Dec 2013
with
(found) Exception 27275 where the occurrence was moved (from 11 Dec 2013) to 11 Jan 2014 (tt_parent_id=27274)
In this case the occurrence was moved into the selection period, but the master is not in there. I need to find the master too.

BTW The four result sets have some overlap, so in all cases is should be UNION instead of UNION ALL, but that is not clear from my question.

Community
  • 1
  • 1
Jan Doggen
  • 8,799
  • 13
  • 70
  • 144
  • Anytime you use the IN clause in Firebird it performs a table-scan. I would try the UNION ALL queries provided by others, sometimes its faster to let Firebird decide on a queries own PLAN than trying to provide OR clauses for your cases. – Israel Lopez Feb 15 '14 at 00:46

1 Answers1

0
select c.*
from tt_calendar c

where 
   ( 
      (c.tt_type in (0,1)) and (c.tt_actualstart <= '2014-12-31') and (c.tt_actualfinish >= '2014-01-01') and (c.tt_emp_id= 20652)
   )

UNION

select c.*
from tt_calendar c 
inner join tt_calendar c2 on c2.tt_calendar_id=c.tt_parentid
where
   (  
      (c.tt_type > 1) and (c.tt_parentid is not null) and  
       (c2.tt_actualstart <= '2014-12-31') and (c2.tt_actualfinish >= '2014-01-01') and (c2.tt_emp_id=20652)
   )

UNION

select c.*
from tt_calendar c 
inner join tt_calendar c2 on c2.tt_calendar_id=c.tt_parentid
where
   (  
      (c.tt_type > 1) and (c.tt_parentid is not null) and  
      (c.tt_actualstart <= '2014-12-31') and (c.tt_actualfinish >= '2014-01-01') and (c2.tt_emp_id=20652)
   )

UNION

select c.*
from tt_calendar c 
inner join tt_calendar c2 on c.tt_calendar_id=c2.tt_parentid
where

   (  
      (c.tt_type = 1) and (c.tt_emp_id=20652) and
      (c2.tt_type > 1) and (c2.tt_parentid is not null) and  
      (c2.tt_actualstart <=  '2014-12-31') and (c2.tt_actualfinish >= '2014-01-01')
   )
)
Jan Doggen
  • 8,799
  • 13
  • 70
  • 144
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
  • The result set misses exactly one of my outlier test records that my 4th query caught ;-) Updating my question. – Jan Doggen Feb 17 '14 at 09:17
  • Solved. I have taken the liberty of replacing your entire answer with my final query. Yours had some errors but it steered me in the right direction. Together with some experimenting on the indices (there is now *one* index on tt_parent_id) the query time is down to 4 seconds. – Jan Doggen Feb 17 '14 at 12:04
  • @JanDoggen,i already made note that you have to correct the syntax.also I believed that only one condition will be true.so i write "union all" which would have been faster than "union".So if only condition will hold true for any given parameter then using "union" is wrong.what happen if you use "union all" – KumarHarsh Feb 17 '14 at 12:14
  • I have to use UNION see the last remark in my question update. UNION ALL gave duplicate results – Jan Doggen Feb 17 '14 at 12:22