2

I have four - five tables which are really big in size and they are left outer joined using the below query. Is there any way that it can be rewritten so that the performance could be improved?

SELECT t1.id,
     MIN(t5.date) AS first_pri_date,
     MIN(t3.date) AS first_pub_date,
     MAX(t3.date) AS last_publ_date,
     MIN(t2.date) AS first_exp_date
FROM t1
    LEFT JOIN t2 ON (t1.id = t2.id)
    LEFT JOIN t3 ON (t3.id = t1.id)
    LEFT JOIN t4 ON (t1.id = t4.id)
    LEFT JOIN t5 ON (t5.p_id =t4.p_id)
GROUP BY t1.id
ORDER BY t1.id;

Record counts are:

  • t1: 6434323
  • t2: 6934562
  • t3: 9141420
  • t4: 11515192
  • t5: 3797768

There are indexes on most of the columns used for join. The most consuming part in the explain plan is the outer join with t4 which is happening in the end. I just wanted to know if there is any way to rewrite this to improve the performance.

Vojtěch Dohnal
  • 7,867
  • 3
  • 43
  • 105
Mamtha
  • 29
  • 1
  • 3
  • 2
    Please use the "edit" link under your question and use the code formatting capabilities of the question editor to properly format your query as a piece of code (it is the toolbutton with two facing braces - `{ }`). Please post the execution plan in your query as well. – peter.hrasko.sk Oct 31 '14 at 09:43
  • If this is the real query, you don't reference T4 or T5 in the select clause, so why are you joining them? – Sparky Oct 31 '14 at 10:15
  • 1
    Where does the `p.date` in your question come from? There's no table named `p` listed in the `FROM` clause. – Brian Camire Oct 31 '14 at 11:30
  • Sorry, it was typo. It is t5.date. t5.date is used in the select clause and there t5 is outer joined with t1 through t4. There is no column ti directly join – Mamtha Nov 02 '14 at 18:09

2 Answers2

1

Assuming that id is primary key in t1, your query might (or might not, depends on the setup of your Oracle's PGA) run better when written follows:

SELECT --+ leading(t1) use_hash(t2x,t3x,t45x) full(t1) no_push_pred(t2x) no_push_pred(t3x) no_push_pred(t45x) all_rows
    t1.id,
    t45x.first_pri_date,
    t3.first_pub_date,
    t3.last_publ_date,
    t2.first_exp_date
FROM t1
    LEFT JOIN (
        SELECT t2.id,
            MIN(t2.date) AS first_exp_date
        FROM t2
        GROUP BY t2.id
    ) t2x
        ON t2x.id = t1.id
    LEFT JOIN (
        SELECT t3.id,
            MIN(t3.date) AS first_pub_date,
            MAX(t3.date) AS last_publ_date
        FROM t3
        GROUP BY t3.id
    ) t3x
        ON t3x.id = t1.id
    LEFT JOIN (
        SELECT --+ leading(t5) use_hash(t4)
            t4.id,
            MIN(t5.date) AS first_pri_date
        FROM t4
            JOIN t5 ON t5.p_id = t4.p_id
        GROUP BY t4.id
    ) t45x
        ON t45x.id = t1.id
ORDER BY t1.id;

This rewrite does not impose any need for creating additional, yet otherwise useless indexes.

peter.hrasko.sk
  • 4,043
  • 2
  • 19
  • 34
1

I would say that your problem is that you are doing many LEFT JOINs and the final resultset gets too big after applying all those JOINs. Also indexes cannot be used this way to calculate MIN or MAX in the fastest possible way. With good use of indexes you should be able to calculate MIN or MAX very quickly.

I would write the query rather like this:

SELECT t1.id,     
(SELECT MIN(t5.date) FROM t5 JOIN t4 ON t5.p_id = t4.p_id WHERE t4.id = t1.id) AS first_pri_date,
(SELECT MIN(date) FROM t3 WHERE t3.id = t1.id) AS first_pub_date,
(SELECT MAX(date) FROM t3 WHERE t3.id = t1.id)  AS last_publ_date,
(SELECT MIN(date) FROM t2 WHERE t2.id = t1.id) AS first_exp_date
FROM t1
ORDER BY t1.id;

For better performace create indexes on (id, date) or (p_id, date). So your indexes would be like this:

CREATE INDEX ix2 ON T2 (id,date);
CREATE INDEX ix3 ON T3 (id,date);
CREATE INDEX ix5 ON T5 (p_id,date);
CREATE INDEX ix4 ON T4 (id);

But there still remains a problem with the join between t4 and t5. In case there is 1:1 relation between t1 and t4, it could be even better to write something like this on the second line:

(SELECT MIN(t5.date) FROM t5 WHERE t5.p_id = (SELECT p_id FROM t4 WHERE t4.id=t1.id)) AS first_pri_date,

If it is 1:N and also if CROSS APPLY and OUTER APPLY work on your Oracle version, you can rewrite the second line like this:

 (SELECT MIN(t5min.PartialMinimum) 
 FROM t4 
 CROSS APPLY 
 (
    SELECT PartialMinimum = MIN(t5.date)
    FROM t5
    WHERE t5.p_id = t4.p_id
 ) AS t5min
 WHERE t4.id = t1.id) 
 AS first_pri_date

All this is aimed at the best possible use of indexes during calculation of MIN or MAX. So the whole SELECT could be rewritten like this:

SELECT t1.id,     
 (SELECT MIN(t5min.PartialMinimum) 
 FROM t4 
 CROSS APPLY 
 (
    SELECT TOP 1 PartialMinimum = date
    FROM t5
    WHERE t5.p_id = t4.p_id
    ORDER BY 1 ASC
 ) AS t5min
 WHERE t4.id = t1.id)  AS first_pri_date,
(SELECT TOP 1 date FROM t2 WHERE t2.id = t1.id ORDER BY 1 ASC)  AS first_exp_date,
(SELECT TOP 1 date FROM t3 WHERE t3.id = t1.id ORDER BY 1 ASC)  AS first_pub_date,
(SELECT TOP 1 date FROM t3 WHERE t3.id = t1.id ORDER BY 1 DESC)  AS last_publ_date
FROM t1 
ORDER BY 1;

This is as I believe most optimal way how to get MIN or MAX from historical data table.

The point is, that using MIN with a lot of non indexed values makes server load all the data into the memory and then calculate MIN or MAX from the non-indexed data, which takes long time because it has high demands on I/O operations. Bad usage of indexes when using MIN or MAX can lead to the situation, where you have all your historical table data cached in memory without needing it for anything else except MIN or MAX calculation.

Without the CROSS APPLY part of the query the server would need to load to memory all individual dates from t5 and calculate MAX from the whole loaded resultset.

Mark that MIN function on properly indexed table behaves like TOP 1 ORDER BY, which is very fast. In this way you can get your results instantly.

CROSS APPLY is available in Oracle 12C, otherwise you can use pipelined functions.

Check this SQL Fiddle, especially the differences in execution plans.

Vojtěch Dohnal
  • 7,867
  • 3
  • 43
  • 105