This is a performance question simplified to join of two indexes. Take the following setup:
CREATE TABLE ZZ_BASE AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_B_I ON ZZ_BASE(ID ASC);
CREATE TABLE ZZ_CHILD AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_C_I ON ZZ_CHILD(ID ASC);
-- As @Flado pointed out, the following is required so index scanning can be done
ALTER TABLE ZZ_BASE MODIFY (ID CONSTRAINT NN_B NOT NULL);
ALTER TABLE ZZ_CHILD MODIFY (ID CONSTRAINT NN_C NOT NULL); -- given the join below not mandatory.
Now I want to LEFT OUTER JOIN these two tables and only output the already indexed ID field.
SELECT ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 4894 (2)| 00:00:30 |
|* 1 | HASH JOIN OUTER | | 1000K| 9765K| 16M| 4894 (2)| 00:00:30 |
| 2 | INDEX FAST FULL SCAN| ZZ_B_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
| 3 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
----------------------------------------------------------------------------------------
As you can see no table access is necessary, only index access. But according to common sense, HASH-joining is not the most optimal way to join these two indexes. If these two tables where much larger, a very large hash table would have to be made.
A much more efficient way would be to SORT-MERGE the two indexes.
SELECT /*+ USE_MERGE(ZZ_BASE ZZ_CHILD) */ ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 4673 (4)| 00:00:29 |
| 4 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
-----------------------------------------------------------------------------------------
But it appears that the second index gets sorted, even if it already is ("If an index exists, then the database can avoid sorting the first data set. However, the database always sorts the second data set, regardless of indexes"1)
Basically, what I want is a query that uses a SORT-MERGE join and instantly starts outputting the records, i.e.:
- no HASH join because it first has to make a hash table (IO overhead if stored on disk) and thus doesn't output instantly.
- no NESTED LOOP join which, although it would output instantly, has log(N) complexity on index pokes and large IO overhead on non-sequential index reads in case the index is large.