0

I have two tables with large amounts of data.

ACTION_SUMMARY and ACTION_DETAIL

Every user has one ACTION_SUMMARY row per day in my db, and zero, one or more ACTION_DETAIL rows per ACTION_SUMMARY.

I'd like to have a query that does returns users that have at least one detail record with an action type (ACTYP_ID) in some set of values.

Here is one example:

select
    AS.USER_ID
from
    ACTION_SUMMARY AS
    JOIN ACTION_DETAIL AD on AS.AS_ID = AD.AS_ID
where
    AS.DATE between '2015-01-01' and '2015-07-07'
    and AD.ACTYP_ID in (45, 25, 11)

AS.DATE has an index on it. However since there are over 2Million users in the DB, and each summary has on average 5-10 detail records, I'm running into a performance issue.

I was thinking of using EXISTS in this fashion:

select
 AS.USER_ID
from
    ACTION_SUMMARY AS
where
    AS.DATE between '2015-01-01' and '2015-07-07'
    and EXISTS (select 1 from ACTION_DETAIL AD where AD.AS_id = AS.AS_ID and AD.ACTYP_ID in (45, 25, 11))

Now, I have two questions:

1) is my query with EXISTS faster - as in will the sub query stop as soon as it finds something and move on?

2) How can I improve my query?

I have indexes on AS.DATE, AS.AS_ID, AD.AS_ID and AD.ACTYP_ID

Thanks

NEW2WEB
  • 503
  • 2
  • 8
  • 22
  • The first query return a lot more rows... 5-10 times the rows of the second one. based on your question details – Javaluca Apr 02 '15 at 21:06
  • Why are you generating an `Action_Summary` row per day (per user)? Why not just use `Action_Detail`? Oh, and [please use an exclusive upper-bound(`<`) with date/time/timestamps](https://sqlblog.org/2011/10/19/what-do-between-and-the-devil-have-in-common) (the blog covers SQL Server, but you can specify fractional seconds in DB2 too). – Clockwork-Muse Apr 04 '15 at 23:50

1 Answers1

0

exists shouldn't be slower than the join. But, if you really want to know, then try running both.

select AS.USER_ID
from ACTION_SUMMARY AS
where AS.DATE between '2015-01-01' and '2015-07-07' and
      EXISTS (select 1
              from ACTION_DETAIL AD
              where AD.AS_id = AS.AS_ID and AD.ACTYP_ID in (45, 25, 11)
             );

The best indexes for this query are: action_summary(date, as_id, user_id) and action_detail(as_id, actyp_id). Note that these are composite indexes with multiple columns.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • So does the EXIST operate like I think it does - meaning the subquery in the exists doesn't need to return all values - the first one it finds the EXISTS will return true? OR does the subquery full execute and run to completion before evaluating EXISTS? - Sorry for the bad use of proper terminology – NEW2WEB Apr 02 '15 at 21:08
  • @NEW2WEB . . . `exists` should stop at the first matching row. DB2 is a decent database, so I would expect it to be smart about exists. – Gordon Linoff Apr 02 '15 at 21:55