0

I have been trying to optimize performance to the following query. I request all experts in this field to give me a hand and suggestions.

I have app. 70k records and my requirement says to remove duplicates. I need to improve the performance of the below query.

select *
  from x.vw_records
 where id not in
       (select distinct id
          from x.vw_datarecords
         where effective_date >= trunc(sysdate - 30)
           and book in (select book_shortname from x.vw_datarecords))
union
select distinct id
  from x.vw_historyrecords
 where effective_date >= trunc(sysdate - 30)
   and book in (select book_shortname from x.vw_datarecords)
union
select distinct id
  from x.vw_transactiondata
 where effective_date >= trunc(sysdate - 30)
   and book in (select book_shortname from x.vw_datarecords);
union
  select distinct id
    from x.vw_cashdata
   where effective_date >= trunc(sysdate - 30)
     and book in (select book_shortname from x.vw_datarecords)

Currently It takes ten minutes to count no. of rows using count(*). Suggest me any ideas to tune performance of this query.

Thanks in Advance.

Brian DeMilia
  • 13,103
  • 1
  • 23
  • 33
Cmen535
  • 25
  • 1
  • 3
  • 15
  • Can you read this query? I can't. – usr Jul 04 '14 at 14:51
  • I think you are missing a left parentheses just before the first union and that the left parentheses at the end of your query shouldn't be there, as an fyi. Looks invalid – Brian DeMilia Jul 04 '14 at 15:17
  • 1
    I would take a look at whether you actually need to use DISTINCT all over the place, as that is likely impacting the performance quite a bit. Also use UNION ALL rather than UNION if the result from 1 block won't overlap with the 2nd naturally, because you're making it do extra work to determine if there is any overlap. – Brian DeMilia Jul 04 '14 at 15:18
  • @BrianDeMilia Sorry for the errors, The query is getting executed.All that I need is performance improvement for the query.Please suggest me if you have any ideas. – Cmen535 Jul 04 '14 at 16:03
  • I just edited it for formatting purposes. It would help to have the DDL for each table or at least a list of columns and indexes. If you can avoid using DISTINCT and use UNION ALL rather than UNION by using the other columns in one way or another to avoid the duplicates that would likely improve performance. DISTINCT is generally bad for performance. – Brian DeMilia Jul 04 '14 at 16:07
  • distinct keyword is used to get rid of duplicate's from one of the each table(ex tables: cashdata,trasactiondata,historydata,datarecords) – Cmen535 Jul 04 '14 at 16:08
  • What are the columns on each table? I know they're getting rid of the 'duplicates' but on the table there are likely other columns. Even if they're not relevant to your current purpose those other columns are the reason the duplicates exist and you may be able to use them to eliminate the duplicates using additional criteria and getting rid of the DISTINCTs – Brian DeMilia Jul 04 '14 at 16:10
  • Yes! u r right.Distinct is bad for performance. My scenario is: Let's say a customer named sam purchases 10 cars from my shop.I need to get his recent transaction details instead of getting all 10 again in the UI.So I used distinct where ever needed for different tables and clubbed all the sub-queries using union. – Cmen535 Jul 04 '14 at 16:12
  • So in that case the table probably has some kind of PURCHASE_DATE field and you can use a subquery to get the very last purchase (max(purchase_date)) rather than all rows for all of his purchases. That would allow you to avoid using distinct and probably use a field that is even indexed. In order to help you write the query though we need to know all of the columns on all of the tables and their data types. – Brian DeMilia Jul 04 '14 at 16:15
  • Right! in my case purchase date is effective date. – Cmen535 Jul 04 '14 at 16:17
  • I need ID, effective date, book from x.vw_datarecords and id from transactiondata,historydata,cashdata. – Cmen535 Jul 04 '14 at 16:18
  • What are all of the columns on all of the tables? (not just the ones you want selected in the output) – Brian DeMilia Jul 04 '14 at 16:21
  • `IN(...)` automatically supresses duplicates (and NULLs. You don't need `distinct` Secondly: rewrite the query using `EXISTS(...)` instead of `IN(...)` (and possibly combine some of the terms) – joop Jul 04 '14 at 16:22
  • @joop EXISTS() is giving me an error. – Cmen535 Jul 04 '14 at 16:24
  • @BrianDeMilia I have almost 50 columns in each table and there are so many dependencies for each table. – Cmen535 Jul 04 '14 at 16:26
  • If your vw_... tables are views, we also need the views definition. – Sebas Jul 05 '14 at 00:49

3 Answers3

3

I've always found better performance swapping out a NOT IN (query) with a left join + where iS NULL

example instead of:

select *
from x.vw_records
where id not in (
    select distinct id
    from x.vw_datarecords
    where effective_date >= trunc(sysdate - 30)
        and book in (
            select book_shortname from x.vw_datarecords
        )

use:

select *
from x.vw_records vr
left join vw_datarecords vdr on vr.id = vdr.id
    and effective_date >= trunc(sysdate - 30)
        and book in (
            select book_shortname from x.vw_datarecords
        )
where vdr.id IS NULL

additionally, you can sometimes get noticeably better performance by doing a group by rather than distinct.

Kritner
  • 13,557
  • 10
  • 46
  • 72
1

I suspect you need indexes. What indexes do you have on the tables involved in your query?

& Time to learn how to use an "explain plan" which is an essential tool for query optimization. It isn't that hard to get one. They may be a bit harder to understand however. Please include the explain plan output with your question.

    EXPLAIN PLAN FOR
      <<Your SQL_Statement here>>
    ;

    SET LINESIZE 130
    SET PAGESIZE 0
    SELECT * FROM table(DBMS_XPLAN.DISPLAY);

There is absolutely zero benefit from using "select distinct" when you are using "union", do not do both, just do one.

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • [Using EXPLAIN PLAN][1] [DBMS_XPLAN][2] [Oracle Optimizer - Explain the Explain Plan][3] [1]: http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_xplan.htm [2]: http://docs.oracle.com/cd/B10500_01/server.920/a96533/ex_plan.htm [3]: http://blogs.oracle.com/optimizer/entry/explain_the_explain_plan_white – Paul Maxwell Jul 05 '14 at 00:39
  • Let's say id=name, a company may have two employee's with same names and in different fields.So In that case you need to use distinct and union. Clarify me If I am wrong. – Cmen535 Jul 07 '14 at 12:28
0

If you could try to use exists/not exists clause in place of in/not in (http://www.techonthenet.com/sql/exists.php). That generally runs much faster.

Shepherdess
  • 651
  • 6
  • 7
  • using Exists keyword for in is giving me an error. And performance wise both are almost same. The fetching time for both(NOT EXISTS in the place of NOT IN) is almost 8 seconds for 50 rows. – Cmen535 Jul 04 '14 at 16:05