6

This query below:

Query 1:

SELECT * FROM DUAL 

is equivalent to and produces the same result as:

Query 2:

SELECT * FROM DUAL 
UNION
SELECT * FROM DUAL

This is obvious BEFORE running the two queries just by looking at them.

However, it seems Oracle doesn't understand this very simple fact and generates two different plans:

Plan 1:

Execution Plan
----------------------------------------------------------
Plan hash value: 272002086

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     2 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Plan 2:

Execution Plan
----------------------------------------------------------
Plan hash value: 646475286

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     2 |     4 |     6  (67)| 00:00:01 |
|   1 |  SORT UNIQUE        |      |     2 |     4 |     6  (67)| 00:00:01 |
|   2 |   UNION-ALL         |      |       |       |            |          |
|   3 |    TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| DUAL |     1 |     2 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Why is that? Isn't a simple comparison of the two UNION-ed blocks less costly than doing both of the UNION-ALL and SORT UNIQUE operations? Or is there a way, a hint, to force Oracle to generate the same plan in both cases?

Thanks!

UPDATE

Tnoy's answer forced me to do some more experiments. Here is the result: When a query is UNION-ed with itself, the bigger query is not necessarily equivalent to the original query.

For example, I have created a very simple test table having only one column and loaded with two identical rows.

Now, my first query on this table which is this:

SELECT * FROM TEST 

returns this result:

  A
-----
  2
  2

while my UNION-ed query of:

SELECT * FROM TEST
UNION
SELECT * FROM TEST

returns this result:

  A
-----
  2

which means the Oracle optimizer is doing the right thing.

Thanks Tony!

RGO
  • 4,586
  • 3
  • 26
  • 40
  • 3
    Probably nobody spent the effort to put a "someone is unioning the same exact thing twice" optimization into the query planner because *why would someone write that query*? :) – hobbs Oct 15 '12 at 13:41
  • This can happen, for example, if you build a query at run-time dynamically and do not remove the duplicates. – RGO Oct 15 '12 at 13:44
  • 1
    Because computers are not inference machines. They don't intrinsically know anything. They do what they're told very precisely. And why would you expect the developer to optimise this case? Nothing is free, being able to optimise this means something else slips, the code base grows, the parsing algorithm grows, the headaches grow, and for what real-life benefit? – MatBailie Oct 15 '12 at 13:52

2 Answers2

3

I'm not an Oracle developer but my guess is the DB engine has to "see" all the rows (the result of a UNION ALL query) before it can work out how many duplicates there are by performing a unique sort.

At the end of your question you say

...is there a way, a hint, to force Oracle to generate the same plan in both cases?

I don't think so, since you are trying to perform two different queries. The first query is for "all rows in the table", your second query is asking for "all unique rows in the table",

Even if you don't have any duplicates the database doesn't know that and must perform a sort.

Tony
  • 9,672
  • 3
  • 47
  • 75
  • Please note the query is doing UNION not UNION ALL which would give different result. But essentially what is actually happening, is that it sees all the rows and unions them together. My question is: 1) why? 2) is there any way to change this behavior? – RGO Oct 15 '12 at 13:51
  • Your query is performing a `UNION` but what I was referring to was the execution plan. It separates out the `UNION` in to a `UNION ALL` and `SORT UNIQUE`. As to your queastions: 1) It's doing this because that is how the database performs a `UNION`, how else do you suggest the plan should be executed? 2) Why do you want to change it, what's wrong with the execution plan you are seeing? If you want to try something different compare the execution plan when using `DISTINCT` and see if that is any different. – Tony Oct 15 '12 at 13:57
  • The optimizer performs a lot of checks before generating a plan. It pushes predicates, it rewrites queries by comparing texts, blah blah. It could logically be done. But I agree with Dems that maybe it is not worth doing such optimization. – RGO Oct 15 '12 at 14:02
  • I'm glad I could help, and +1 on your question for writing up your findings. – Tony Oct 16 '12 at 12:04
2

What is obvious to a person looking at the queries is not necessarily obvious to the optimizer.

The UNION statement does a UNION ALL followed by a DISTINCT. Although the database optimizer could search for this particular case (a query that does a union between exactly the same tables), the developers/managers have to make decisions on priorities.

Identify the optimization in this case is presumably very low priority. In your particular example, the execution plans may be different but both are basically instantaneous (unless you are going to be doing this in a loop for millions of examples).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786