62

I'm doing a UNION of two queries on an Oracle database. Both of them have a WHERE clause. Is there a difference in the performance if I do the WHERE after UNIONing the queries compared to performing the UNION after WHERE clause?

For example:

SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colA, colB FROM tableB WHERE colA > 1

compared to:

SELECT * 
  FROM (SELECT colA, colB FROM tableA
        UNION
        SELECT colA, colB FROM tableB) 
 WHERE colA > 1

I believe in the second case, it performs a full table scan on both the tables affecting the performance. Is that correct?

abhi
  • 1,760
  • 1
  • 24
  • 40
MNIK
  • 1,581
  • 3
  • 16
  • 22
  • 7
    Get the explain plan and prove your belief. And after that run a test and time them in your environment and see which one wins. – Kuberchaun Mar 25 '11 at 20:06
  • For such a simple query, there's likely no difference because Oracle would likely push the predicate (WHERE clause) into the derived table/inline view. – OMG Ponies Mar 26 '11 at 00:08
  • It may be valuable to know which version of Oracle you are running. – EvilTeach Mar 27 '11 at 23:38

9 Answers9

22

In my experience, Oracle is very good at pushing simple predicates around. The following test was made on Oracle 11.2. I'm fairly certain it produces the same execution plan on all releases of 10g as well.

(Please people, feel free to leave a comment if you run an earlier version and tried the following)

create table table1(a number, b number);
create table table2(a number, b number);

explain plan for
select *
  from (select a,b from table1
        union 
        select a,b from table2
       )
 where a > 1;

select * 
  from table(dbms_xplan.display(format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
---------------------------------------
| Id  | Operation            | Name   |
---------------------------------------
|   0 | SELECT STATEMENT     |        |
|   1 |  VIEW                |        |
|   2 |   SORT UNIQUE        |        |
|   3 |    UNION-ALL         |        |
|*  4 |     TABLE ACCESS FULL| TABLE1 |
|*  5 |     TABLE ACCESS FULL| TABLE2 |
---------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------    
   4 - filter("A">1)
   5 - filter("A">1)

As you can see at steps (4,5), the predicate is pushed down and applied before the sort (union).

I couldn't get the optimizer to push down an entire sub query such as

 where a = (select max(a) from empty_table)

or a join. With proper PK/FK constraints in place it might be possible, but clearly there are limitations :)

Ronnis
  • 12,593
  • 2
  • 32
  • 52
13

NOTE: While my advice was true many years ago, Oracle's optimizer has improved so that the location of the where definitely no longer matters here. However preferring UNION ALL vs UNION will always be true, and portable SQL should avoid depending on optimizations that may not be in all databases.

Short answer, you want the WHERE before the UNION and you want to use UNION ALL if at all possible. If you are using UNION ALL then check the EXPLAIN output, Oracle might be smart enough to optimize the WHERE condition if it is left after.

The reason is the following. The definition of a UNION says that if there are duplicates in the two data sets, they have to be removed. Therefore there is an implicit GROUP BY in that operation, which tends to be slow. Worse yet, Oracle's optimizer (at least as of 3 years ago, and I don't think it has changed) doesn't try to push conditions through a GROUP BY (implicit or explicit). Therefore Oracle has to construct larger data sets than necessary, group them, and only then gets to filter. Thus prefiltering wherever possible is officially a Good Idea. (This is, incidentally, why it is important to put conditions in the WHERE whenever possible instead of leaving them in a HAVING clause.)

Furthermore if you happen to know that there won't be duplicates between the two data sets, then use UNION ALL. That is like UNION in that it concatenates datasets, but it doesn't try to deduplicate data. This saves an expensive grouping operation. In my experience it is quite common to be able to take advantage of this operation.

Since UNION ALL does not have an implicit GROUP BY in it, it is possible that Oracle's optimizer knows how to push conditions through it. I don't have Oracle sitting around to test, so you will need to test that yourself.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • This is incorrect, at least for databases created in the past 10 years. – Jon Heller Jun 07 '16 at 17:54
  • 4
    @JonHeller You were replying in 2016 to a post written in 2011 that was clearly marked as based on my experience several years before that. A lot could have changed, and a succinct statement of what is much more helpful than a blanket, "that's wrong". – btilly Mar 20 '17 at 16:41
  • 1
    The example in [11.2 Performance Tuning Guide](http://docs.oracle.com/cd/E29505_01/server.1111/e16638/optimops.htm#i55050) show this scenario and how it works even with the `WHERE` after the `UNION`. I don't see the same in the 10g manual, so either it was new in 11g or not documented in 10g. It might be unfair to judge answers based on new information but the answer is no longer helpful to people. I know an undeserved downvote sucks but it's less worse than giving misleading advice to tens of thousands of people. – Jon Heller Mar 20 '17 at 20:49
  • @JonHeller I know for a fact that it wasn't there in 8, and I'm pretty sure it wasn't there in 10. However if you're going to different databases, it isn't behavior to rely on. And preferring UNION ALL over UNION will be forever true based on the SQL Standard. But I'll update the answer. – btilly Mar 20 '17 at 22:33
10

Just a caution

If you tried

SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colX, colA FROM tableB WHERE colA > 1

compared to:

SELECT * 
  FROM (SELECT colA, colB FROM tableA
        UNION
        SELECT colX, colA FROM tableB) 
 WHERE colA > 1

Then in the second query, the colA in the where clause will actually have the colX from tableB, making it a very different query. If columns are being aliased in this way, it can get confusing.

Gary Myers
  • 34,963
  • 3
  • 49
  • 74
7

You need to look at the explain plans, but unless there is an INDEX or PARTITION on COL_A, you are looking at a FULL TABLE SCAN on both tables.

With that in mind, your first example is throwing out some of the data as it does the FULL TABLE SCAN. That result is being sorted by the UNION, then duplicate data is dropped. This gives you your result set.

In the second example, you are pulling the full contents of both tables. That result is likely to be larger. So the UNION is sorting more data, then dropping the duplicate stuff. Then the filter is being applied to give you the result set you are after.

As a general rule, the earlier you filter away data, the smaller the data set, and the faster you will get your results. As always, your milage may vary.

EvilTeach
  • 28,120
  • 21
  • 85
  • 141
3
SELECT * FROM (SELECT colA, colB FROM tableA UNION SELECT colA, colB FROM tableB) as tableC WHERE tableC.colA > 1

If we're using a union that contains the same field name in 2 tables, then we need to give a name to the sub query as tableC(in above query). Finally, the WHERE condition should be WHERE tableC.colA > 1

devlin carnate
  • 8,309
  • 7
  • 48
  • 82
2

I would make sure you have an index on ColA, and then run both of them and time them. That would give you the best answer.

rayman86
  • 1,385
  • 10
  • 9
  • 2
    I didn't downvote, but it might be a reaction to the pervasive "add an index to solve any performance problem" mentality. – Jeffrey Kemp Mar 31 '11 at 01:11
1

i think it will depend on many things - run EXPLAIN PLAN on each one to see what your optimizer selects. Otherwise - as @rayman suggests - run them both and time them.

Randy
  • 16,480
  • 1
  • 37
  • 55
-4
SELECT colA, colB FROM tableA  WHERE colA > 1
UNION
SELECT colX, colA FROM tableB
rink.attendant.6
  • 44,500
  • 61
  • 101
  • 156
Michael Andrews
  • 194
  • 6
  • 16
-5
SELECT * 
FROM (SELECT * FROM can
    UNION
    SELECT * FROM employee) as e
WHERE e.id = 1;
dur
  • 15,689
  • 25
  • 79
  • 125
  • While the SQL doesn't provide a solution to the original question, you should always consider providing supporting information to an answer. – Bonez024 May 10 '19 at 16:15