1

I have these two tables:

TABLE A:
ID  COUNTRY  CAPITAL    CONTINENT
1   Slovakia Bratislava Europe
2   Senegal  Dakar      Africa
3   Brazil   Brasilia   South America
4   Wales    Cardiff    Europe
5   Egypt    Cairo      Africa

TABLE B:
ID  COUNTRY   CAPITAL      CONTINENT
5   Egypt     Cairo        Africa
6   Argentina Buenos Aires South America
7   Hungary   Budapest     Europe
2   Senegal   Dacar        Africa

When I do UNION, I get the expected result:

CREATE TABLE COMB_UNION AS 

    SELECT * FROM A
    UNION 
    SELECT * FROM B;

1   Slovakia    Bratislava  Europe
2   Senegal Dacar   Africa
2   Senegal Dakar   Africa
3   Brazil  Brasilia    South America
4   Wales   Cardiff Europe
5   Egypt   Cairo   Africa
6   Argentina   Buenos Aires    South America
7   Hungary Budapest    Europe

However, I get missing values when using FULL OUTER JOIN and I don't understand why. It should produce the same result as UNION, right? As far as I understand it, it should produce all the records from both tables and any matching records. Which obviously isn't the case

CREATE TABLE OUTER_JOIN AS 

        SELECT 
            A.ID, A.COUNTRY, A.CAPITAL, A.CONTINENT 
            FROM A FULL OUTER JOIN B 
            ON A.ID = B.ID;

1   Slovakia    Bratislava  Europe
2   Senegal Dakar   Africa
3   Brazil  Brasilia    South America
4   Wales   Cardiff Europe
5   Egypt   Cairo   Africa
.           
.       

What am I missing here? I am doing it in PROC SQL if that makes any difference.

Any help appreciated :)

puk789
  • 322
  • 2
  • 8
  • 28

4 Answers4

0
select      coalesce(A.ID       ,B.ID)         as ID
           ,coalesce(A.COUNTRY  ,B.COUNTRY)    as COUNTRY
           ,coalesce(A.CAPITAL  ,B.CAPITAL)    as CAPITAL
           ,coalesce(A.CONTINENT,B.CONTINENT)  as CONTINENT

FROM        A FULL OUTER JOIN B 
            ON A.ID = B.ID
;

Run this and see what you've missed:

select      *

FROM        A FULL OUTER JOIN B 
            ON A.ID = B.ID
;
David דודו Markovitz
  • 42,900
  • 6
  • 64
  • 88
0

This is too long for a comment.

Why would you think that a full outer join would produce the same result as a union? That really isn't correct thinking. In your case, you could write the query so it comes close to being true by doing:

CREATE TABLE OUTER_JOIN AS 
    SELECT COALESCE(A.ID, B.ID) as ID,
           COALESCE(A.COUNTRY, B.COUNTRY) as COUNTRY,
           COALESCE(A.CAPITAL, B.CAPITAL) as CAPITAL,
           COALESCE(A.CONTINENT, B.CONTINENT) as CONTINENT
     FROM A FULL OUTER JOIN
          B 
          ON A.ID = B.ID;

This is the same as the UNION if the following are true:

  • The id column is unique in each table.
  • The data columns have the same values when the ids match.

This is not true in your data. You can get an equivalent result if you match on all the columns:

CREATE TABLE OUTER_JOIN AS 
    SELECT COALESCE(A.ID, B.ID) as ID,
           COALESCE(A.COUNTRY, B.COUNTRY) as COUNTRY,
           COALESCE(A.CAPITAL, B.CAPITAL) as CAPITAL,
           COALESCE(A.CONTINENT, B.CONTINENT) as CONTINENT
     FROM A FULL OUTER JOIN
          B 
          ON A.ID = B.ID AND A.COUNTRY = B.COUNTRY AND
             A.CAPITAL = B.CAPITAL AND A.CONTINENT = B.CONTINENT;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Can you expand on why it isn't correct thinking? FULL OUTER JOIN returns records from both (left and right) tables and records that matched. UNION returns distinct rows if they are in the left table or in the right table or both. Can't really see a significant difference here? – puk789 Nov 29 '16 at 18:29
  • 2
    Because are different, `FULL JOIN` is `LEFT JOIN` + `RIGHT JOIN` you can see one example for full join [**here**](http://www.w3schools.com/Sql/sql_join_full.asp) You can see how appear `NULL` values when doesnt find a match. `UNION` doesnt appear nulls, and doesnt appear more columns. – Juan Carlos Oropeza Nov 29 '16 at 18:42
  • 1
    "Why would you think that...?" Well, because outer join **IS** a union! "Relationally speaking, it's a kind of shotgun marriage: It forces tables into a kind of union — yes, I do mean union, not join — even when the tables in question fail to conform to the usual requirements for union. It does this, in effect, by padding one or both of the tables with nulls before doing the union, thereby making them conform to those usual requirements after all." [C J Date](https://www.safaribooksonline.com/library/view/sql-and-relational/9781449319724/ch04s06.html) – onedaywhen Nov 30 '16 at 09:27
  • You would only equate `full outer join` with `union` if you equated `inner join` with `intersect`. That would be a misunderstanding of relational terminology. – Gordon Linoff Nov 30 '16 at 12:13
  • @puk789 See my answer's TL;DR. – philipxy Feb 08 '17 at 08:00
0

Think of union as an append. Gather all records from first table then gather all records from second table. The columns in both tables need to match position and type but not name. Finally remove records with exact duplicate data in all selected columns.

Union All will not perform the duplicate removal.

These are good for building a total list from partial lists. Like cities in US might be one table and cities in Canada might be in another table. The names might be different (state vs. province, zip vs. postal code) but based on position in the select list can end up in the same column. (province will show up in the state column...)

Joins on the other hand are designed to expand data vertically, although as Gordon points out you can use a full outer join to mimic a union. Although it would be equivalent to union all and not remove duplicates as a union would unless you add distinct in an outer, wrapper, query.

This would allow you add data that is related but stored in a different table. For examples add order information to a customer and possibly add order details to the order. This will create duplicate data when a record in one table has many matches in the other table.

This is just a quick rundown, there are tons of articles and tutorials on the unions and joins.

Joe C
  • 3,925
  • 2
  • 11
  • 31
0

TL;DR Full outer join is the union of left outer join & right outer join, and left & right outer joins are inner join unioned with unmatched left & right (respectively) rows extended by nulls. So outer joins are equivalent to unions of stuff, but, unlike union, the stuff isn't its arguments, it's stuff derived from its arguments. If the arguments have the same columns, there aren't any nulls added, but we still have that full outer join is the union of left outer join & right outer join, and left & right outer joins are inner join unioned with unmatched left & right (respectively) rows; none of those three is union.


Using relational terminology, if we have a value NULL in every attribute domain, NATURAL FULL OUTER JOIN of two NULL-free inputs returns the UNION of three tables: the NATURAL JOIN of the inputs, the unmatched rows of the left input extended by NULLs and the unmatched rows of the right input extended by NULLs. So (under those assumptions) UNION can be replaced by NATURAL FULL OUTER JOIN.

In SQL, the situation is complicated by duplicate rows, ordered columns, NULLs being treated as special values by operators, NULLs being used differently in determining when subrows are distinct for JOIN vs UNION, and the fact that FROM involves a non-natural cross product. Also most DBMSs do not offer FULL or CORRESPONDING.

SQL FULL OUTER JOIN returns a bag containing the rows in the INNER JOIN of the inputs plus the unmatched rows of each input extended by NULLs.

Assuming neither input has duplicate rows or NULLs, SQL UNION CORRESPONDING is given by NATURAL FULL OUTER JOIN. Ie UNION is given by

SELECT COALESCE(left.L1, right.R1) AS L1, ...
FROM left FULL OUTER JOIN right
ON left.L1 = right.R1 AND ...

for all columns, as paired by the UNION.

Once you allow inputs with duplicates and/or NULLs, expressing UNION in terms of OUTER JOIN and other operators becomes complex. Eg: Suppose we have the same row with NULL in both inputs. UNION will treat these as not distinct, so one such row is in the output; but FULL OUTER JOIN will output two such rows.

SQLFiddle

Use UNION when you want rows satisfying the membership criterion (predicate) of one input OR of the other. Use OUTER JOIN when you want rows satisfying the membership criteria of one AND of the other OR of just one with other columns NULL. There's just no reason to simulate the simple semantics/behaviour of UNION in terms of the complex semantics/behaviour of OUTER JOIN.

philipxy
  • 14,867
  • 6
  • 39
  • 83