2

I'm trying to make a data step that creates a column in my table that has the sum of ten, fifteen, twenty and fortyfive lagged variables. What I have below works, but it is not practicle to write this code for the twenty and fortyfive summed lags. I'm new to SAS and can't find a good way to write the code. Any help would be greatly appreciated.

Here's what I have:

data averages;
set work.cuts;
sum_lag_ten = (lag10(col) + lag9(col) + lag8(col) + lag7(col) + lag6(col) + lag5(col) + lag4(col) + lag3(col) + lag2(col) + lag1(col));
run;
Jarom
  • 1,067
  • 1
  • 14
  • 36

2 Answers2

4

Proc EXPAND allows the easy calculation for moving statistics. Technically it requires a time component, but if you don't have one you can make one up, just make sure it's consecutive. A row number would work.

Given this, I'm not sure it's less code, but it's easier to read and type. And if you're calculating for multiple variables it's much more scalable.

Transformout specifies the transformation, In this case a moving sum with a window of 10 periods. Trimleft/right can be used to ensure that only records with a full 10 days are included. You may need to tweak these depending on what exactly you want. The third example under PROC EXPAND has examples.

 Data have;
  Set have;
  RowNum = _n_;
  Run;

Proc EXPAND data=have out=want;
ID rownum;
Convert col=col_lag10 / transformout=(MOVSUM 10 trimleft 9);
Run;

Documentation(SAS/STAT 14.1)

http://support.sas.com/documentation/cdl/en/etsug/68148/HTML/default/viewer.htm#etsug_expand_examples04.htm

Reeza
  • 20,510
  • 4
  • 21
  • 38
  • thanks for the code, it works perfeclty. You tipped me off to learning about proc expand when I started researching SAS/ETS. I learned a ton from your help. Great link as well. Thank you! – Jarom Dec 09 '16 at 15:59
3

If you must do this in the datastep (and if you do things like this regularly, SAS/ETS has better tools for sure), I would do it like this.

data want;
  set sashelp.steel;
  array lags[20];
  retain lags1-lags20;

  *move everything up one;
  do _i = dim(lags) to 2 by -1;
    lags[_i] = lags[_i-1];
  end;

  *assign the current record value;
  lags[1] = steel;

  *now calculate sums;
  *if you want only earlier records and NOT this record, then use lags2-lags11, or do the sum before the move everything up one step;
  lag_sum_10 = sum(of lags1-lags10);  
  lag_sum_15 = sum(of lags1-lags15);  *etc.;
run;

Note - this is not the best solution (I think a hash table is better), but this is better for a more intermediate level programmer as it uses data step variables.

I don't use a temporary array because you need to use variable shortcuts to do the sum; with temporary array you don't get that, unfortunately (so no way to sum just 1-10, you have to sum [*] only).

Joe
  • 62,789
  • 6
  • 49
  • 67