0

We know that in SQL the order the code is executed is:

  1. FROM
  2. WHERE
  3. GROUP
  4. SELECT

Now, in SQL I can write (Example1):

SELECT colA, colB
FROM tableA
UNION
SELECT colA, colB
FROM tableB
WHERE colA > 1

and I can also write (Example2):

SELECT * 
FROM (
  SELECT colA, colB
  FROM tableA
  UNION
  SELECT colX, colA
  FROM tableB
)
WHERE colA > 1

My question is about Example1: the WHERE condition is on tableB or on the UNION of tableA and tableB? Unlike Example2, where it's very clear, in Example1 it's not (?).

I didn't find any documentation in any DB that explains this about the UNION.

Alessio Cantarella
  • 5,077
  • 3
  • 27
  • 34
HaloKu
  • 405
  • 2
  • 7
  • 17
  • what is clear in example2? – mangusta Dec 23 '19 at 08:39
  • @mangusta The WHERE is on the result of the union due to the brackets – HaloKu Dec 23 '19 at 08:40
  • I doubt whether the example 1 will run or not without giving aliases. It will say something like colA in the query is ambiguous. – dxjuv Dec 23 '19 at 08:42
  • how do you know it is on the result of union? the presence of brackets does not necessarily guarantee that – mangusta Dec 23 '19 at 08:43
  • @mangusta WHERE clause is done after FROM – HaloKu Dec 23 '19 at 08:45
  • no, it depends on the execution plan. have a look here: https://stackoverflow.com/questions/5437507/union-with-where-clause/5439460. in this example, the filters are applied prior to union, although union is in brackets – mangusta Dec 23 '19 at 09:19

3 Answers3

1

The WHERE clause as written in the first statement belongs to the second part of the UNION. You can test this by using different column names in the two tables:

create table t1 (c1 int, c2 int);
create table t2 (c3 int, c4 int);

Then the following query:

select c1, c2 
from t1
union all
select c3, c4
from t2
where c1 > 1

will result in an error claiming that the column c1 does not exist (or can't be used in that context - depending on the DBMS product).

The same rule applies to a GROUP BY or HAVING, but not to a possible ORDER BY - that always sorts the result of the union.

Postgres example

Oracle example

SQL Server example

1

To re-cap the diagrams below, keeping it simple:

A query_block has no set operator (e.g. UNION) and no ORDER BY. The WHERE clause applies at this level:

[WITH ...] SELECT ... FROM ... [WHERE ...] [GROUP BY ... [HAVING ...]]

A subquery is one or more query_blocks separated by set operators, with optional ORDER BY. The ordering applies after the set operators:

query_block [UNION query_block]* [ORDER BY ...]

So to answer your question: In Example1, the WHERE condition is on tableB.


The diagrams in the Oracle 12c Language Reference shows it pretty good:

select::=
select

subquery::=
subquery

query_block::=
query_block

Andreas
  • 154,647
  • 11
  • 152
  • 247
0

Hi @HaloKu Have you tested it ? I believe that a small test would give an answer to your question...

Please check this example.

You can see that for the Example 1, the where clause will influence only the tableB. The example is for Oracle Database. Please can you give us the information what database do you use?

Here is the TEST for MySQL: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=62675070b512ba4441603bdef2b9b771 It acts the same way as Oracle does...

Also, please check this question here: UNION with WHERE clause

VBoka
  • 8,995
  • 3
  • 16
  • 24
  • This is exactly my question. Is this behavior DB specific? Should I expect diffrent behavior in Presto or MySQL? Is it part of Native SQL defintion or not? – HaloKu Dec 23 '19 at 08:59
  • What part exactly. You only asked "The WHERE condition is on tableB or on the union of tableA and tableB?". And I answered that already... I can update my answer for MySQL also but what is there that is giving you a problem? – VBoka Dec 23 '19 at 09:02