Performance optimization with EXISTS

Question

I'm working on a DWH supplying script, that deal with ~40M lines on an Oracle Exadata server.

I've got the following tables :

CREATE TABLE P.ARCHIVED_ID
(
    ID_ARCHIVED  VARCHAR2(31 BYTE) NOT NULL, -- CONSTRAINT UNIQUE / PRIMARY KEY
    IS_DELETE    CHAR(1 BYTE)      DEFAULT null
);

CREATE TABLE P.DWH_tableX
(
    ID      VARCHAR2(31 BYTE) NOT NULL, -- CONSTRAINT UNIQUE / PRIMARY KEY
    FIELD1  VARCHAR2(31 BYTE),
    FIELD2  VARCHAR2(31 BYTE),
    FIELD3  CHAR(2 BYTE),
    FIELD4  CHAR(1 BYTE)
);

As you can see, I've got a IS_DELETE flag that can be set to Y (and is NULL by default).

This is the standard query I need to use all around :

SELECT 
    ID,
    FIELD1,
    FIELD2,
    FIELD3,
    FIELD4
FROM P.DWH_tableX A
WHERE EXISTS (
    SELECT 1 
    FROM P.ARCHIVED_ID B 
    WHERE 
        A.ID = B.ID_ARCHIVED 
        AND IS_DELETE = 'Y'
    );

Question, is there a better way to optimize this than the following index ?

CREATE BITMAP INDEX P.I_IS_DELETE 
    ON P.ARCHIVED_ID(IS_DELETE)
LOGGING
TABLESPACE TBS_P_01
NOPARALLEL;

A few more points :

I'll never access P.ARCHIVED_ID without WHERE IS_DELETE = 'Y' (or only once when I update this field at the beginning of the script)
~~by default, Oracle don't register NULL in index, so I use NULL instead of "N", allowing me to make a partial ="Y" bitmap index, keeping it as small and useful as I can.~~ FALSE
I use EXISTS, instead of IN, as Oracle recommend (ability to use index),
The part with IS_DELETE = 'Y' is for now 0%, but as it's a DWH, it's expected to only increase (as new lines are create everyday in the data source, and older are physically removed, setting them at IS_DELETE = 'Y' in the DWH)
the WHERE EXISTS is used around 20 time in my script, as it's how we deal to froze the archived values (move them and back on a temporary table, with insert append hint)

Try compare explain plan for your query with join , have thoughts that `join` will be more optimized then correlated `exist`. — Seyran, Apr 30 '17 at 08:27

APC · Accepted Answer · 2017-04-30T20:43:38.617

With any performance tuning exercise the devil is in the details. The following is some guesswork based on rules of thumb. You must run some performance benchmarks for yourself, using your actual tables and actual data.

"is there a better way to optimize this than the following index"

Almost certainly, yes.

The benefit of bitmap indexes lies in having several of them. That way,when we issue a query filtering on those columns the optimizer can choose to execute a Star Transformation to find the rows in the intersection of the bitmaps. Even then, bitmap indexes on bivalent columns are as useful as columns with several different values.

One bitmap index on its own, particularly one with only two values, isn't much use. Given the monstrous overheads of maintain bitmap indexes, and their concurrency issues, you should probably consider other options.

"~40M lines on an Oracle Exadata server"

Oracle have engineered the Exadata appliance for crunching through large volumes of data. This means looking for paths which support hash joins, Bloom filters and similar operations. With Exadata a common tuning technique is to drop an existing index rather than creating a new one. While cheaper than bitmaps, B-Tree indexes still incur costs resource (CPU, storage, memory) so it's worth considering whether using Exadata's brute force offers a lower cost overall. That's what we pay the big bucks for.

However, even Exadata's raw power is a limited resource. So if you're going to run this query a lot (or rather the EXISTS sub-query) you will likely get a benefit from clustering the excluded rows. From your question it seems IS_DELETE is an updated attribute you can't use physical organisation at the table level (CTAS, attribute clustering). So a B-tree index on ARCHIVED_ID(IS_DELETE, ID_ARCHIVED) is the primary candidate.

With compound indexes it's usually best to start with the least selective column, and that is true here. You're only interested in the rows where IS_DELETE = 'Y', so leading with that column will reduce the number of blocks the sub-query needs to visit. Leading with ID_ARCHIVED would mean the sub-query has to scan the whole index. Even with Exadata, we should always seek to minimize the work undertaken to get a set of records.

But please benchmark this, or any other index.

" For the bitmap index, I was thinking it only track Not Null value (so Y), I'm mistaken ?"

'Fraid so. Bitmap indexes have an entry for each indexed field. So, unlike single-column B-tree indexes, they do index null entries.

"I use EXISTS, instead of IN, as Oracle recommend (ability to use index),"

Hmmm, not sure what Oracle recommendation you think you're citing there. Which is better really depends on the data. But basically if the sub-query is huge and the main query is relatively small then EXISTS is more efficient. But if the data volumes are reversed, the main query is huge and the sub-query is small IN is the better choice. Given your current starting position is the sub-query returns no rows (because nothing is deleted) and presumably relatively few records will be deleted in the near future it seems more likely that IN is the construct you should be using.

But again: benchmark each approach and see what works better with your data.

I've added a bit of info on my question. For the bitmap index, I was thinking it only track Not Null value (so `Y`), I'm mistaken ? / for the raw exadata perf, I know, but as it's a recurrent query pattern and it's again one of the biggest table, I was hopping to improve it a bit by myself. I'll try some explain. — Blag, Apr 30 '17 at 18:23
ok, I've done some test : the B-Tree index is in fact "a bit slower" than the bitmap I'm using, but as the cost change from `4` to `6` (against `700` without index), I'll just stay with my bitmap and don't bother tuning this query, thanks for your time anyway ;) — Blag, May 05 '17 at 15:14

score 0 · Answer 2 · answered Apr 30 '17 at 00:11

0

For this query:

SELECT ID, FIELD1, FIELD2, FIELD3, FIELD4
FROM P.DWH_tableX A
WHERE EXISTS (
    SELECT 1 
    FROM P.ARCHIVED_ID B 
    WHERE 
        A.ID = B.ID_ARCHIVED AND
        B.IS_DELETE = 'Y'
    );

My first thought of an index would be ARCHIVED_ID(ID_ARCHIVED, IS_DELETE). It would be interesting to know if the bitmapped index performed better.

answered Apr 30 '17 at 00:11

Gordon Linoff

1,242,037
58
646
786

1

That's the whole question in fact. For now the bitmap (=`Y`) is empty, so it'll work faster than anything I think (oracle take a look at the index, see nothing match the `Y`, don't bother reading/joining DWH_* and output 0 line). In some years, The `Y` case will be the biggest part of the table, so what ? ; BTW, `ARCHIVED_ID(ID_ARCHIVED, IS_DELETE)` and not `ARCHIVED_ID(IS_DELETE, ID_ARCHIVED)` ? I would have put the flag first to restrain the output. – Blag Apr 30 '17 at 00:30
@Blag . . . Try rewriting the query to use `IN` instead of `EXISTS`. That might make the bitmapped index more useful. – Gordon Linoff Apr 30 '17 at 23:14

score 0 · Answer 3 · answered Apr 30 '17 at 08:04

my thoughts : 1) do an explain plan for anything you want to test 2) as the IS_DELETE is NULLABLE not sure the index will be used, if not use a Function Based Index and see with the explain plan if so

I would test a FBI , compound index like

create index P.I_IS_DELETE on ARCHIVED_ID( decode(IS_DELETE,null,'N','Y'),ARCHIVED_ID));

gather statistics

exec dbms_stats.gather_table_stats( user, 'ARCHIVED_ID', cascade=>TRUE );

alter the select accordingly

SELECT ID, FIELD1, FIELD2, FIELD3, FIELD4 FROM P.DWH_tableX A WHERE EXISTS ( SELECT 1 FROM P.ARCHIVED_ID B WHERE A.ID = B.ID_ARCHIVED AND decode(IS_DELETE,null,'N','Y') = 'Y' );

and see how it goes ...

Performance optimization with EXISTS

3 Answers3