3

I have this next query for which I'm trying to improve performance:

select atx.journal_id
    ,ab.c_date
from acct_batch ab 
    join acct_tx atx on ab.acct_id = atx.acct_id 
      and ab.batch_id = atx.batch_id
    join journal j on j.journal_id = atx.journal_id
      and j.journal_type_id = 6
    join acct a on a.acct_id = atx.acct_id 
      and a.acct_type_id = 32
    join payments p on p.payment_id = j.payment_id
    join routing r on r.route_id = p.route_id 
      and r.acq_code = 'RZ_NS'
    join acq_acct aa on aa.acq_code = r.acq_code
      and aa.acq_acct_code = r.acq_acct_code
      and aa.slc = 'MXM'
where ab.c_date between to_date(to_char('01-JUL-2015')) and  last_day(sysdate);

I've ran and reviewed the explain plan and the total cost is 7388. Out of this, the most expensive part is the join with the journal table, which has a cost of 6319.

Part_of_explain_plan

The table has approximately 1.6 million rows with 87 partitions, out of which only two contain the rows (partition 6 with 1.4 million and partition 12 with the approximate rest of 200k rows).

The first thing I've tried is to re-write the query to avoid the full-scan on matching the actual journal_type_id to 6, but I guess my understanding was incorrect, because the cost remained 7388.

select atx.journal_id
    ,ab.c_date
from acct_batch ab 
    join acct_tx atx on ab.acct_id = atx.acct_id 
      and ab.batch_id = atx.batch_id
    join (select 
              journal_id
              , payment_id 
          from journal 
          where journal_type_id = 6) j on j.journal_id = atx.journal_id
    join acct a on a.acct_id = atx.acct_id 
      and a.acct_type_id = 32
    join payments p on p.payment_id = j.payment_id
    join routing r on r.route_id = p.route_id 
      and r.acq_code = 'RZ_NS'
    join acq_acct aa on aa.acq_code = r.acq_code
      and aa.acq_acct_code = r.acq_acct_code
      and aa.slc = 'MXM'
where ab.c_date between to_date(to_char('01-JUL-2015')) and  last_day(sysdate);

I did look for a lot of resources and one of the reason which determined me to re-write the query was this video.

I'm still actively looking for ways of improving performance, but I thought I'd shoot a question here to maybe get some hints.

I think what the person from the video says about the first thing that should be done is to identify which is your "driving table" (the one which determines what rows are being selected - based on the key), so I'm currently looking for a way of re-writing the query to identify and use this driving table and its indexes as much as possible.

I don't know if I'm on the right track yet, but please stop me if you think I should proceed otherwise. Also, please note that I am a total beginner in performance tuning, actually this being my first.

Any help is appreciated.

Update:

Some of the indexes which contain the columns used in the queries are:

╔════════════╦═══════════════╦════════════╦═══════════╦═════════════╦═══════════════════════════════════╗
║   Table    ║   IndexName   ║ Uniqueness ║ IndexType ║ Partitioned ║              Columns              ║
╠════════════╬═══════════════╬════════════╬═══════════╬═════════════╬═══════════════════════════════════╣
║ Acct_Batch ║ Acct_Batch_PK ║ UNIQUE     ║ NORMAL    ║ NO          ║ Acct_ID, Batch_ID                 ║
║ Acct_TX    ║ Acct_TX_IDX   ║ NONUNIQUE  ║ NORMAL    ║ YES         ║ Acct_ID, Batch_ID                 ║
║ Acct_TX    ║ Acct_TX_BIDX  ║ NONUNIQUE  ║ NORMAL    ║ YES         ║ Journal_ID, Acct_ID               ║
║ Journal    ║ Journal_PK    ║ UNIQUE     ║ NORMAL    ║ YES         ║ Journal_ID                        ║
║ Journal    ║ JType_BIDX    ║ NONUNIQUE  ║ NORMAL    ║ YES         ║ Journal_Type_ID, Book_Date        ║
║ Journal    ║ JType_BIDX_2  ║ NONUNIQUE  ║ NORMAL    ║ YES         ║ MCODE, Journal_Type_ID, Book_Date ║
║ Journal    ║ JPay_BIDX     ║ NONUNIQUE  ║ NORMAL    ║ YES         ║ Payment_ID, Journal_ID            ║
╚════════════╩═══════════════╩════════════╩═══════════╩═════════════╩═══════════════════════════════════╝

Let me know if you need to see more indexes or details regarding the other tables.

Sample explain plan:

-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |                   |     1 |   160 |  7388   (1)| 00:01:29 |       |       |
|*  1 |  FILTER                                   |                   |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                            |                   |       |       |            |          |       |       |
|   3 |    NESTED LOOPS                           |                   |     1 |   160 |  7388   (1)| 00:01:29 |       |       |
|*  4 |     HASH JOIN                             |                   |     4 |   604 |  7380   (1)| 00:01:29 |       |       |
|   5 |      NESTED LOOPS                         |                   |       |       |            |          |       |       |
|   6 |       NESTED LOOPS                        |                   |   107 | 14338 |  7372   (1)| 00:01:29 |       |       |
|*  7 |        HASH JOIN                          |                   |    27 |  3186 |  7298   (1)| 00:01:28 |       |       |
|   8 |         NESTED LOOPS                      |                   |       |       |            |          |       |       |
|   9 |          NESTED LOOPS                     |                   |   102 | 10302 |   978   (0)| 00:00:12 |       |       |
|  10 |           NESTED LOOPS                    |                   |    11 |   638 |    37   (0)| 00:00:01 |       |       |
|* 11 |            TABLE ACCESS BY INDEX ROWID    | ACQ_ACCT          |    11 |   253 |     4   (0)| 00:00:01 |       |       |
|* 12 |             INDEX RANGE SCAN              | AA_PK             |    16 |       |     2   (0)| 00:00:01 |       |       |
|  13 |            TABLE ACCESS BY INDEX ROWID    | ROUTES            |     1 |    35 |     3   (0)| 00:00:01 |       |       |
|* 14 |             INDEX RANGE SCAN              | R_A_BIDX          |     1 |       |     2   (0)| 00:00:01 |       |       |
|  15 |           PARTITION RANGE ALL             |                   |    95 |       |    84   (0)| 00:00:02 |     1 |    84 |
|* 16 |            INDEX RANGE SCAN               | P_R_ID_BIDX       |    95 |       |    84   (0)| 00:00:02 |     1 |    84 |
|  17 |          TABLE ACCESS BY LOCAL INDEX ROWID| PAYMENTS          |     9 |   387 |   100   (0)| 00:00:02 |     1 |     1 |
|  18 |         PARTITION RANGE ALL               |                   |   107K|  1782K|  6319   (1)| 00:01:16 |     1 |    87 |
|* 19 |          TABLE ACCESS FULL                | JOURNAL           |   107K|  1782K|  6319   (1)| 00:01:16 |     1 |    87 |
|  20 |        PARTITION RANGE ITERATOR           |                   |     4 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 21 |         INDEX RANGE SCAN                  | ATX_A_IDX         |     4 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|  22 |       TABLE ACCESS BY LOCAL INDEX ROWID   | ACCT_TX           |     4 |    64 |     3   (0)| 00:00:01 |     1 |     1 |
|* 23 |      INDEX RANGE SCAN                     | AB_B_A_IDX        |  5006 | 85102 |     8   (0)| 00:00:01 |       |       |
|* 24 |     INDEX UNIQUE SCAN                     | ACC_PK            |     1 |       |     1   (0)| 00:00:01 |       |       |
|* 25 |    TABLE ACCESS BY INDEX ROWID            | ACCT              |     1 |     9 |     2   (0)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------------------------
Radu Gheorghiu
  • 20,049
  • 16
  • 72
  • 107
  • 1
    It would help if you add your tables indexes. – Jorge Campos Jul 27 '15 at 11:51
  • @JorgeCampos I'm looking into finding out what indexes are used by the columns in the `JOIN`s, but I'm not finding any information regarding what type of index it is (only if it's a composite index and if it's unique or not). I'm a total beginner with Oracle as well, I have a slightly more experienced background with SQL Server, so.. I'm trying to find any information about index types: clustered, non-clustered, heap etc. Let me know if this is useful, or if it even exists in Oracle.. – Radu Gheorghiu Jul 27 '15 at 12:07
  • Which column is key in partition? And attach full plan for first query. – Arkadiusz Łukasiewicz Jul 27 '15 at 12:24
  • @ArkadiuszŁukasiewicz How can I see this? I can't seem to find it in SQL Developer.. – Radu Gheorghiu Jul 27 '15 at 12:38
  • select * from USER_PART_KEY_COLUMNS. – Arkadiusz Łukasiewicz Jul 27 '15 at 12:48
  • @ArkadiuszŁukasiewicz I have added an explain plan for the first query – Radu Gheorghiu Jul 27 '15 at 12:55

2 Answers2

1

First check that your statistics are updated: the optimizer heavily depends on stats! Second you should say something about the number of rows that you get with this query: depending on the number of rows that each condition selects it could be better a full scan than an index search.

cristian v
  • 1,022
  • 6
  • 8
  • I did just read something about the optimizer deciding that sometimes a FULL scan is better for reading large chunks of data, and as you can see from my explain plan, the join/condition on JOURNAL table returns 107k rows. So, I'm not sure what can be done for this.m – Radu Gheorghiu Jul 27 '15 at 13:57
  • The number of rows outputted by this query is 20. – Radu Gheorghiu Jul 27 '15 at 14:10
  • I remember that when the results are more than 5% of a table it is more convenient a table scan. You could force to use an index but the performance would be worst. – cristian v Jul 27 '15 at 23:45
  • Ok, so what would be a good approach? Look for what JOIN or conditions filter out the most records and make those operations early? Or... something else, maybe? I'm not shooting for the best optimization solution first, I'm looking for **improvement** first and then I'll see if I can iterate on that. – Radu Gheorghiu Jul 28 '15 at 07:11
  • Also, regarding your estimates on what is the minimum percentage of data, here is [**an article which points to something else**](https://richardfoote.wordpress.com/2008/05/12/index-scan-or-full-table-scan-the-magic-number-magic-dance/). – Radu Gheorghiu Jul 28 '15 at 07:19
  • Sorry, I answered you from my phone and I hadn't realized that the problem is the "journal_type_id = 6" condition: if it doesn't filter a small number of records (that, as your article says, is not a fixed percentage) is completely pointless and the optimizer will go for a FTS – cristian v Jul 28 '15 at 15:42
0

So, after closer reviewing of the code, of the data displayed based on columns listed in the SELECT part of the query, I observed that the last joined table does not bring any contribution (does not require any data to be displayed from it) to the output.

join acq_acct aa on aa.acq_code = r.acq_code
  and aa.acq_acct_code = r.acq_acct_code
  and aa.slc = 'MXM'

Hence, I moved this query into an EXISTS clause and re-ran the query. My modified query looks like this:

select atx.journal_id
    ,ab.c_date
from acct_batch ab 
    join acct_tx atx on ab.acct_id = atx.acct_id 
      and ab.batch_id = atx.batch_id
    join journal j on j.journal_id = atx.journal_id
      and j.journal_type_id = 6
    join acct a on a.acct_id = atx.acct_id 
      and a.acct_type_id = 32
    join payments p on p.payment_id = j.payment_id
    join routing r on r.route_id = p.route_id 
      and r.acq_code = 'RZ_NS'
where ab.c_date between to_date(to_char('01-JUL-2015')) and  last_day(sysdate)
    and exists (select 1
                from acq_acct aa
                where aa.acq_code = r.acq_code
                    and aa.acq_acct_code = r.acq_acct_code
                    and aa.slc = 'MXM');

This helped improve my query cost from 7388 to 292, which is a whopping difference.

Hopefully I did get the correct understanding on this and my explanation also made sense.

If anyone thinks that my conclusions were off or the "logical reasoning" was incorrect, please leave a comment (for now, my conclusions/explanations from above make sense to me).

Radu Gheorghiu
  • 20,049
  • 16
  • 72
  • 107
  • Did it remove the FTS? Has the execution time improved? Honestly it looks strange to me because journal and acct_tx are the biggest tables and I don't see any other option than a hash join between them. I would have tried to add an index on journal(journal_id, journal_type_id) to bypass the table access. – cristian v Jul 28 '15 at 15:52
  • @cristianv That was my first thought too, since I assumed an index would improve performance. But this is a table from production and with a lot of other indexes already on it, so the last option was adding another index. – Radu Gheorghiu Jul 28 '15 at 16:07