2

I have two large tables (~1GB each) with many different columns on which I want to perform a union all in sas.

Currently, I use the following method with proc sql and union all.

SELECT A, B, '' as C from Table_1 UNION ALL SELECT '' as A, B, C from Table_2

However, this is not preferable as I have dozens of rows in both tables and I am constantly adding to them. Therefore, I am looking for a way to automatically create the blank columns without having to explicitly write them out.

I also tried the following query:
select * from (select * from Table_1), (select * from Table_2)

However, this seems very computationally intensive and takes forever to run.

Are there any better ways to do this? I am also open to using data set instead of proc sql;

Brian
  • 26,662
  • 52
  • 135
  • 170
  • I'm not quite following your goal here (although I only know a minimal amount of SAS.) There's no way a cross join is going to be right though. – shawnt00 Feb 09 '16 at 06:29
  • Definitely not a cross join. I want to basically union the tables vertically and just add blank entries for the fields that don't line up. – Brian Feb 09 '16 at 06:48
  • That query wouldn't quite work on some systems as written (needs aliases). But nevertheless that comma appears to be a cross join and explains why it's slow. Perhaps `full outer join...on 0=1`? – shawnt00 Feb 09 '16 at 06:57

2 Answers2

8

A simple data step should do a thing:

data result_tab;
set Table_1 Table_2;
run;

This will rewrite both tables. Records from Table_2 will be added at the end of the result_tab. Set statement in data step will declare variables from both input tables.

Bagin
  • 516
  • 3
  • 8
0

Unfortunately, PROC SQL does require all dataset to have the same variables when using UNION. If you can use DATA SET then PROC SORT NODUPKEY that would be simplest (maybe not most efficient). To use PROC SQL, uou need to assign NULL values to the missing variables. For example:

data dset1;
input var1 var2;
datalines;
1 2
2 2
3 2
;
run;

data dset2;
input var1 var3;
datalines;
4 1
5 1
6 1
;
run;

PROC SQL;
    CREATE TABLE dset3 AS
    SELECT var1, var2, . AS var3 FROM dset1 
    UNION
    SELECT var1, . AS var2, var3 FROM dset2
QUIT;

PROC PRINT DATA=dset3; RUN;
Dave O
  • 19
  • 4