SAS: Merging two tables with identical columns while dropping null values

Question

I'm not sure if the title does this question justice, but here it goes:

I have three datasets Forecasts1, Forecasts2 and Forecasts3. They are all time series data composed of a date variable and variables r1 through r241.

For a given r variable (lets just use r1-r3, and only Forecasts 1 and 2 for now) each dataset has only one row where the value isn't null, and it is a different row in each dataset.

Forecast 1 looks like this:

Forecast 2 looks like this:

I need to be able to combine them such that r1-r3 contain all the non-null values, without creating duplicate date rows to hold the null values.

So ideally the finished produce would look like this:

I've tried various types of merges and sets, but I keep getting duplicate date rows. How would I go about doing this properly for all 241 (or more) variables? (specifically in SAS or Proc SQL?)

LINKS TO GOOGLE DOCS CONTAINING DATA:

Forecasts1: https://docs.google.com/spreadsheets/d/1iUEwPltU6V6ijgnkALFiIdrwrolDFt8xaITZaFC4WN8/edit?usp=sharing

Forecasts2: https://docs.google.com/spreadsheets/d/1lQGKYJlz6AAR-DWtoWnl8SwzCNAmSpj7yxRqRgnybr8/edit?usp=sharing

Can you post sample data? Photographs don't count since I cannot paste them into an editor. — Tom, Apr 02 '17 at 14:50
I added two google drive links containing the spreadsheets. Will that be sufficient? — Josh Kraushaar, Apr 02 '17 at 15:01

score 4 · Accepted Answer · answered Apr 02 '17 at 15:03

Did you try the UPDATE statement?

data forecast1 ;
  input date r1-r3 ;
cards;
1 1 . .
2 . 2 .
3 . . 3
4 . . . 
;
data forecast2 ;
  input date r1-r3 ;
cards;
2 2 . .
3 . 3 .
4 . . 4 
5 . . .
;

data want ;
  update forecast1 forecast2 ;
  by date ;
run;

proc print; run;

Results

date    r1    r2    r3
  1      1     .     .
  2      2     2     .
  3      .     3     3
  4      .     .     4
  5      .     .     .

score 1 · Answer 2 · answered Apr 02 '17 at 14:42

1

I tend to approach these types of problems using proc sql. Assuming one row per date in the data sets, you can use full outer join:

proc sql;
    select coalesce(f1.date, f2.date) as date,
           coalesce(f1.r1, f2.r1) as r1,
           coalesce(f1.r2, f2.r2) as r2,
           coalesce(f1.r3, f2.r3) as r3
    from forecast1 f1 full outer join
         forecast2 f2
         on f1.date = f2.date

answered Apr 02 '17 at 14:42

Gordon Linoff

1,242,037
58
646
786

How does this scale when you have 241+ variables though? – Josh Kraushaar Apr 02 '17 at 14:45
1

@JoshKraushaar . . . I would generate the code in a spreadsheet. However, SAS offers other alternatives for iterating through columns in the data step. However, this points out a folly of putting such data all in one row. I would prefer to have one row per data item, rather than multiple columns. – Gordon Linoff Apr 02 '17 at 15:26

score 1 · Answer 3 · answered Apr 02 '17 at 15:12

Consider a union query with aggregation. The only drawback is writing out the aggregates for all 241 columns in outer query.

proc sql;
   SELECT sub.date, Max(sub.r1) AS R1, Max(sub.r2) AS R2, Max(sub.r3) AS R3, ...
   FROM
     (SELECT *
      FROM Forecasts1 f1
      UNION ALL
      SELECT *
      FROM Forecasts2 f2) As sub
    GROUP BY sub.date
quit;

score 1 · Answer 4 · answered Apr 03 '17 at 00:26

1

A different solution would be to append all and delete rows where all are missing.

data want; 
  set forecast1-forecast3 indsname=fc;
  model = fc; *store name of forecast model;
  if nmiss(of r1-r3) = 3 then delete;
run;

answered Apr 03 '17 at 00:26

Reeza

20,510
4
21
38

oh thats extremely efficient, I think thats probably the ideal solution for this. Thank you! – Josh Kraushaar Apr 03 '17 at 01:53

SAS: Merging two tables with identical columns while dropping null values

4 Answers4