14

This is a performance question simplified to join of two indexes. Take the following setup:

CREATE TABLE ZZ_BASE AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_B_I ON ZZ_BASE(ID ASC); 
CREATE TABLE ZZ_CHILD AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_C_I ON ZZ_CHILD(ID ASC);

-- As @Flado pointed out, the following is required so index scanning can be done
ALTER TABLE ZZ_BASE MODIFY (ID CONSTRAINT NN_B NOT NULL); 
ALTER TABLE ZZ_CHILD MODIFY (ID CONSTRAINT NN_C NOT NULL); -- given the join below not mandatory.

Now I want to LEFT OUTER JOIN these two tables and only output the already indexed ID field.

SELECT  ZZ_BASE.ID 
FROM ZZ_BASE 
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
----------------------------------------------------------------------------------------
| Id  | Operation             | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |        |  1000K|  9765K|       |  4894   (2)| 00:00:30 |
|*  1 |  HASH JOIN OUTER      |        |  1000K|  9765K|    16M|  4894   (2)| 00:00:30 |
|   2 |   INDEX FAST FULL SCAN| ZZ_B_I |  1000K|  4882K|       |   948   (3)| 00:00:06 |
|   3 |   INDEX FAST FULL SCAN| ZZ_C_I |  1000K|  4882K|       |   948   (3)| 00:00:06 |
----------------------------------------------------------------------------------------

As you can see no table access is necessary, only index access. But according to common sense, HASH-joining is not the most optimal way to join these two indexes. If these two tables where much larger, a very large hash table would have to be made.

A much more efficient way would be to SORT-MERGE the two indexes.

SELECT /*+ USE_MERGE(ZZ_BASE ZZ_CHILD) */ ZZ_BASE.ID 
FROM ZZ_BASE 
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
-----------------------------------------------------------------------------------------
| Id  | Operation              | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |        |  1000K|  9765K|       |  6931   (3)| 00:00:42 |
|   1 |  MERGE JOIN OUTER      |        |  1000K|  9765K|       |  6931   (3)| 00:00:42 |
|   2 |   INDEX FULL SCAN      | ZZ_B_I |  1000K|  4882K|       |  2258   (2)| 00:00:14 |
|*  3 |   SORT JOIN            |        |  1000K|  4882K|    22M|  4673   (4)| 00:00:29 |
|   4 |    INDEX FAST FULL SCAN| ZZ_C_I |  1000K|  4882K|       |   948   (3)| 00:00:06 |
-----------------------------------------------------------------------------------------

But it appears that the second index gets sorted, even if it already is ("If an index exists, then the database can avoid sorting the first data set. However, the database always sorts the second data set, regardless of indexes"1)

Basically, what I want is a query that uses a SORT-MERGE join and instantly starts outputting the records, i.e.:

  • no HASH join because it first has to make a hash table (IO overhead if stored on disk) and thus doesn't output instantly.
  • no NESTED LOOP join which, although it would output instantly, has log(N) complexity on index pokes and large IO overhead on non-sequential index reads in case the index is large.
Davor Josipovic
  • 5,296
  • 1
  • 39
  • 57
  • Oracle sometimes performs a pass-thru sort (which does nothing and is non blocking) which is labled something like NO_SORT, so I suppose it might be applied in this case (or at least it should be). – gen-y-s Dec 18 '15 at 15:22
  • Also the comment you quoted (source?) applies to table scan, not neccessarily to index scan – gen-y-s Dec 18 '15 at 15:24
  • 2
    _large IO overhead on non-sequential index reads in case the index is large._ Reading an index in index order, index full scan, is also non-sequential IO at the disk level. Index fast full scan can do sequential IO, but gets the data in a non sorted order. The index is not layed out in order on the disk. – Shannon Severance Dec 18 '15 at 15:57
  • @Shannon Severance got the point. It is cheaper for optimizer FAST FULL SCAN your index and then sort it, than use SEQUENTIAL SCAN on this index. – Slava Lenskyy Dec 18 '15 at 16:15
  • @ShannonSeverance, Skoffer That is indeed a good point that I didn't know. But still it is not true that a INDEX FAST FULL SCAN + SORT (cost=4673) is more efficient than a INDEX FULL SCAN (cost=2258). Also, because the tabel is relatively small, 22M temp space is insignificant. But in production systems, it is not uncommon for such sorts to require several GB of temp space. And in those scenarios it seems to me that 2X INDEX FULL SCAN is far more efficient that SORTING or making HASH tables. – Davor Josipovic Dec 18 '15 at 16:39
  • Oracle takes size into account when making the query plan. What happens at MB may or may not happen at GB. – Shannon Severance Dec 18 '15 at 16:44
  • I am not quite sure what you are actually asking. Do you want to force Oracle to do use SORT MERGE rather than HASH JOIN ? You can use USE_MERGE hint - but I see in the question that you know how to use it. – krokodilko Dec 18 '15 at 16:44
  • @Skoffer, if it were cheaper to fast full scan then sort, then Oracle should have chosen that for both inputs to the sort merge. – Shannon Severance Dec 18 '15 at 17:04
  • 1
    _Basically, what I want is a query that uses a SORT-MERGE join and instantly starts outputting the records ..._ Based on the documentation you cited, you'll have to take that up with Oracle so that they change SORT-MERGE to use sorted sources on both sources when available. – Shannon Severance Dec 18 '15 at 18:56
  • @Shannon Severance that s not necessary true for all indexes. And I didn't not mean that it's true in every situation. In this particular situation optimizer decided that FULL SCAN and SORT is better. To see if optimizer is mistaken we'd better see actual numbers. Use gather_plan_statistics hint and then dbms_xplan.display with format "ALL". – Slava Lenskyy Dec 19 '15 at 06:21
  • @ShannonSeverance _Oracle takes size into account when making the query plan. What happens at MB may or may not happen at GB._ True, but the reason I did the test above is because on the production systems I am working on, similar parallel queries take several minutes to complete, clog memory and IO, while they _could_ be done instantly if the MERGE was done on 2x INDEX FULL SCANs. Now I am searching for a way to force this somehow (for comparison), but forcing it seems impossible by design (?). – Davor Josipovic Dec 19 '15 at 14:18

1 Answers1

1

INDEX_ASC (or just INDEX) is the hint you might want to try in order to compare performance with real data.

I am a little surprised that you get any kind of index scan for the outer row source, since B*Tree indexes cannot find NULL keys and ZZ_BASE does not have a NOT NULL constraint. Adding that and hinting a bit more will get you the full scan in index order of the ZZ_C_I index. That does not save you the SORT JOIN step, unfortunately, but at least it should be much faster - O(n) - since the data is already sorted.

alter table zz_base modify (id not null);
SELECT 
  /*+ leading(zz_base) USE_MERGE(ZZ_CHILD) 
      index_asc(zz_base (id)) index(zz_child (id)) */ ZZ_BASE.ID 
FROM ZZ_BASE left outer join ZZ_CHILD on zz_base.id=zz_child.id;

This query uses the following execution plan:

------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |  1000K|  9765K|       |  8241   (3)| 00:00:50 |
|   1 |  MERGE JOIN OUTER |        |  1000K|  9765K|       |  8241   (3)| 00:00:50 |
|   2 |   INDEX FULL SCAN | ZZ_B_I |  1000K|  4882K|       |  2258   (2)| 00:00:14 |
|*  3 |   SORT JOIN       |        |  1000K|  4882K|    22M|  5983   (3)| 00:00:36 |
|   4 |    INDEX FULL SCAN| ZZ_C_I |  1000K|  4882K|       |  2258   (2)| 00:00:14 |
------------------------------------------------------------------------------------
Davor Josipovic
  • 5,296
  • 1
  • 39
  • 57
Flado
  • 36
  • 3
  • you are right with saying that `NOT NULL` constraint is required here. I just redid the test, and I only get the second execution plan if I add this constraint. Don't know how I missed that. And you are correct. The hints indeed force INDEX FULL SCAN on the ZZ_CHILD table. Nice one! And welcome to stackoverflow! I'll add the execution plan to your answer. – Davor Josipovic Jul 15 '16 at 11:03