6

Could someone verify my understanding of proc sql union operations? My interpretation of the differences between outer union and union is the following:

  1. Union deletes duplicate rows, while outer union does not

  2. Union will overlay columns, while outer union, by default, will not.

So, would there be any difference between union all corresponding and outer union corresponding? It seems like "ALL" would remove the first difference, and "CORRESPONDING" would remove the second difference, but I'm concerned there could be an additional difference between the two I'm not seeing.

Joe
  • 62,789
  • 6
  • 49
  • 67
Nate
  • 420
  • 1
  • 5
  • 17
  • 2
    Adding [tag:sql] since this is not SAS-specific but similar in most SQL flavors. – Joe Feb 04 '14 at 02:48
  • anyone came across relational algebra operator/symbol to represent outer union? for example full outer join is represented with `⟗`. Is something similar for outer union already there? – Mahesha999 Aug 20 '16 at 14:07

3 Answers3

5

It turns out there is, actually, a difference: how columns which only exist in one dataset are handled. Outer Union Corresponding will display columns that appear only in one dataset, not overlaid by position. Union All Corresponding will not display any columns that appear in only one dataset.

Nate
  • 420
  • 1
  • 5
  • 17
4

My understanding is that OUTER UNION and UNION ALL are effectively if not actually identical. CORR is needed for either one to guarantee the columns line up; with OUTER UNION the columns will not stack even if they are identical, while with UNION ALL the columns always stack even if they are not identical (must be same data type or it will error), and pay no attention at all to column name. In both cases adding CORR causes them to stack.

Here are some examples:

Not stacking:

proc sql;
select height, weight from sashelp.class
union all 
select weight,height from sashelp.class;
select height, weight from sashelp.class
outer union
select height, weight from sashelp.class;
quit;

Stacking:

proc sql;
select height, weight from sashelp.class
union all corr
select weight,height from sashelp.class;
select height, weight from sashelp.class
outer union corr
select height, weight from sashelp.class;
quit;

This SAS doc page does a good job of showing the differences.

Joe
  • 62,789
  • 6
  • 49
  • 67
  • Thanks for the response and confirmation. As far as corresponding, I believe the difference is that for union all without corresponding, the first column in the first dataset will be stacked with the first column in the second dataset (regardless of name). With outer union corresponding, the columns will not be stacked. Corresponding would remove that difference. – Nate Feb 04 '14 at 04:56
  • Actually, I think I see what you're saying now- and while that's not what I thought was the case, it does seem to indeed be. – Joe Feb 04 '14 at 05:04
  • 1
    Hey Joe, turns out there is a difference, see my answer below. Figured you might be interested. – Nate Apr 17 '14 at 17:40
1

Try these code and you will see their difference clearly:

The test data:

data data1;
input id $ expense;
cards;
001 9
001 8
003 7
;
run;


data data2;
input id $ cost;
cards;
001 30
002 80
;
run;

This is result by union all:

proc sql;
select
Id,
sum(Expense) label='COST'
from WORK.data1
group by 1
union all
select
Id,
sum(Cost)
from WORK.data2
group by 1
order by 1,2
;
run;

This is result by outer union corr:

proc sql;
select
Id,
sum(Expense) as Cost
from WORK.data1
group by Id
outer union corr
select
Id,
sum(Cost)
from WORK.data2
group by Id
order by 1,2
;
run;
Coco
  • 11
  • 2