1

I have an Oracle SQL query that is using a Correlated Subquery:

Q1

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND t2.effective_date IN (SELECT max(tc.effective_date)
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code)

I have changed the subquery to us an Empty OVER() caluse which has significantly improved the performance of this query:

Q2

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND t2.effective_date IN (SELECT max(tc.effective_date) OVER () AS ed
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code)

The new query returns the same result set as the original query, so appears to be working..., but why is the explain plan so different, it appears to be still correlated, is it not evaluating for every row in the outer query anymore? why?

I want to understand what is going on in the second query.

I think I could re-write this query a 3rd way using row_number() OVER (partition by ...):

Q3

    SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate),
       ct.*
FROM table1 t1
         JOIN table2 t2 ON t2.code = t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
         JOIN (SELECT ct.*, row_number() OVER (PARTITION BY ct.code ORDER BY ct.effective_date ASC) AS rn
               FROM tablecore ct) ct
              ON t1.code = ct.code
                  AND rn = 1
                  AND ct.effective_date <= t3.processed_date
WHERE t2.effective_date in(ct.effective_date)
  AND t4.type IN ('value1', 'value2', 'value3')
  AND t3.processed_date >= '01 JUL 2019'
  AND t3.processed_date < '22 JUL 2019'
  AND t2.effective_date IN (ct.effective_date);

This version seems to work too, but is slower than the second version.

EDIT AS pointed out by @Christian Q3 will return incorrect results

JTK
  • 1,469
  • 2
  • 22
  • 39

2 Answers2

1

Here is another option for you to try:

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND EXISTS (SELECT 1
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code
                                HAVING t2.effective_date = max(tc.effective_date))

For proper performance analysis, it would be great to have the Sizes of your tables, and an explain Plan of the different queries (and a Check, if the Optimizer Statistics are accurate).

And FYI: Your last query with row_number() will not deliver the same result, as the windowing function will be evaluated before the join ct.effective_date <= t3.processed_date, and therefore you will miss some rows.

Christian
  • 36
  • 4
  • `And FYI: Your last query with row_number() will not deliver the same result, as the windowing function will be evaluated before the join ct.effective_date <= t3.processed_date, and therefore you will miss some rows.` I have tested the last query and I'm not loosing any rows, are you sure that it could cause issues? – JTK Jul 22 '19 at 13:02
  • For each ct.code, the largest `ct.effective_date` will have `rn = 1`. If this `effective_date` is larger than the `t3.processed_date`, it will not match both conditions. If you don't have such cases in your code, then you won't lose any rows (with your current data). But are you sure this never happens? Then you would need this condition `ct.effective_date <= t3.processed_date` – Christian Jul 23 '19 at 06:47
  • I understand now, thanks, `tableCore ct`contains static data which is versioned by `effective_date` so `ct.effective_date <= t3.processed_date` will always evaluate to true, but I think the query will possibly return the wrong version, for the reason you've pointed out. Could you confirm that my Q2 will work as expected though, and explain what the empty Over() clause is doing to make the query so much faster? the explain plan for my Q2 is nearly identical to the explain plan for the query you provided. – JTK Jul 23 '19 at 13:27
  • 1
    Hard to say without an Explain Plan :) My Guess from my tries is, that Oracle uses a Nested Loop (Do a Lookup on `tableCore`for each Row in the Main Query) in the first Query, while the empty OVER() forces Oracle to Read the hole `tableCore` in advance, which in this case is the faster Option. – Christian Jul 23 '19 at 13:55
  • I'll have to sanitize the explain plan before I post it, your assumption makes sense, I can't find any resource to suggest that the empty `OVER()` clause would behave this way, seems like a very useful optimization for a query like this, you would think I could find an example somewhere... thank you for your help on this :) – JTK Jul 23 '19 at 14:26
0

Why are you using IN when you can simply use = operator

SELECT t1.id,
       t3.code,
       t3.processed_date,
       (t1.total / t2.rate) 
FROM table1 t1
         JOIN table2 t2 ON t2.code= t1.code
         JOIN table3 t3 ON t3.id = t1.id
         JOIN table4 t4 ON t4.code = t1.code
    AND t4.type IN ('value1', 'value2', 'value3')
    AND t3.processed_date >= '01 JUL 2019'
    AND t3.processed_date < '22 JUL 2019'
    AND t2.effective_date = (SELECT max(tc.effective_date)
                                FROM tableCore tc
                                WHERE tc.effective_date <= t3.processed_date
                                  AND t1.code = tc.code)

The significant performance improvement would be because of the reduced I/O from Disk

Andrews B Anthony
  • 1,381
  • 9
  • 27
  • I didn't write the original query, possibly it had more conditions in it when it was originally written, I just didn't change it while I was experimenting with the analytic functions. – JTK Jul 22 '19 at 12:57