0

So I have a data named table1 as follows:

Obs  ID  M_201812  M_201901      M_201902    M_201903

1    X1     1         .             .           . 
2    X2     1         1             .           . 
3    X3     .         1             1           . 
4    X4     .         1             .           . 
5    X5     .         1             .           . 
6    X6     1         .             .           . 
7    X7     1         1             .           . 
8    X8     1         1             .           . 
9    X9     .         .             1           . 
10   X10    1         1             .           . 

Each column here is a month, which is dynamically generated based on some previously run macro. The months will be dynamic and will vary. What I need to do is calculate sums of last 3 months, last 6 months and last 12 months. The approach I had in my mind was as follows: A) Store the column names in a macro variable:

proc sql noprint;
    select distinct name
    into :cols2 separated by ','  
    from dictionary.columns
    where upcase(memname) = 'Table1' and name not in ('ID');
    ;
quit;
%put &cols2.

The output was as follows:

M_201812,M_201901,M_201902,M_201903

B) Create sums thereafter based on the number of items in the variable:

data table1;
set table1;

if count("&cols2",",") <=3 then do;
3m_total=sum(of &cols2);
6m_total=sum(of &cols2);
12m_total=sum(of &cols2);
end;
else if 3< count("&cols2",",") <=6 then do;
3m_total=sum(%scan(%superQ(cols2),-1,%str(,)),%scan(%superQ(cols2),-2,%str(,)),%scan(%superQ(cols2),-3,%str(,)));
6m_total=sum(of &cols2);
12m_total=sum(of &cols2);
end;
else if 6< count("&cols2",",") <=12 then do;
3m_total=sum(%scan(%superQ(cols2),-1,%str(,)),%scan(%superQ(cols2),-2,%str(,)),%scan(%superQ(cols2),-3,%str(,)));
6m_total=sum(%scan(%superQ(cols2),-1,%str(,)),%scan(%superQ(cols2),-2,%str(,)),%scan(%superQ(cols2),-3,%str(,)),%scan(%superQ(cols2),-4,%str(,)),%scan(%superQ(cols2),-5,%str(,)),%scan(%superQ(cols2),-6,%str(,)));
12m_total=sum(of &cols2);
else do;
    3m_total=sum(%scan(%superQ(cols2),-1,%str(,)),%scan(%superQ(cols2),-2,%str(,)),%scan(%superQ(cols2),-3,%str(,)));
    6m_total=sum(%scan(%superQ(cols2),-1,%str(,)),%scan(%superQ(cols2),-2,%str(,)),%scan(%superQ(cols2),-3,%str(,)),%scan(%superQ(cols2),-4,%str(,)),%scan(%superQ(cols2),-5,%str(,)),%scan(%superQ(cols2),-6,%str(,)));
    12m_total=sum(%scan(%superQ(cols2),-1,%str(,)),%scan(%superQ(cols2),-2,%str(,)),%scan(%superQ(cols2),-3,%str(,)),%scan(%superQ(cols2),-4,%str(,)),%scan(%superQ(cols2),-5,%str(,)),%scan(%superQ(cols2),-6,%str(,)),
    %scan(%superQ(cols2),-7,%str(,)),%scan(%superQ(cols2),-8,%str(,)),%scan(%superQ(cols2),-9,%str(,)),%scan(%superQ(cols2),-10,%str(,)),%scan(%superQ(cols2),-11,%str(,)),%scan(%superQ(cols2),-12,%str(,)));
    end;
    run;

Basically we get 12 months sum only if there are 12 monthly columns available. If only 3 months are available, then 3months sum=6months sum=12months sum. After running the code, I get the following error:

ERROR 159-185: Null parameters for SUM are invalid.

This happens at the last else do statement. I can't for the life of me figure out why won't sas be able to read a simple if-then-do-else statement. Is there an error in the if conditions or in calling the macro variable? Any help here would be appreciated. Thanks a lot.

IndigoChild
  • 842
  • 3
  • 11
  • 29
  • 1
    Is there a reason you are starting with that structure? Can't you just start with vertical structure where the date is the value of variable instead of the name of the variable? – Tom Jun 27 '19 at 16:48
  • 1
    Echoing @Tom, this is a report structure, not a good structure for maintaining or calculating data. It can be done, but is a pain. – Reeza Jun 27 '19 at 17:16
  • 1
    Echoing @Reeza, specifically, you should keep your data long with *MM/YYY* column and not wide which helps so much with practically any analysis. See `proc transpose`. – Parfait Jun 27 '19 at 17:27

2 Answers2

1

Hard to tell without seeing what code your macro logic is actually generating. But most likely it is because you are generating two commas with nothing between them like this:

475   data test;
476     y=sum(1,,2,3);
                -
                159
ERROR 159-185: Null parameters for SUM are invalid.

477   run;

You need to either not generate that extra comma or put something between the commas. So either a period for a missing value or a 0.

In general SAS is easier if you don't use commas. So get a space delimited list of variable names.

proc contents data=Table1 noprint
  out=_contents (where=(name ne 'ID'))
; 
run;

proc sql noprint;
select name
     , case when (varnum>9) then name else ' ' end 
     , case when (varnum>6) then name else ' ' end 
     , case when (varnum>3) then name else ' ' end
  into :cols_12 separated by ' '
     , :cols_3 separated by ' '
     , :cols_6 separated by ' '
     , :cols_9 separated by ' '
  from _contents
  order by name
;
quit;

Then use that list of names using the of keyword so you don't need commas between them.

data want;
  set table1;
  total_3m=sum(0,of &cols_3);
  total_6m=sum(0,of &cols_6);
  total_9m=sum(0,of &cols_9);
  total=sum(0,of &cols_12);
run;

Also remember to use valid names for your variables. Variable names cannot start with numbers.

Tom
  • 47,574
  • 2
  • 16
  • 29
  • Thanks Tom for your suggestion. I ran your code. But there were a few instances, where I noticed that some of the ids had incorrectly populated 3m, 6m, 12m total. So for ex: for ID X8 in the given table, his 3m total should be 1(from 201901-201903), and his 6months total should be 2(from whatever available to 201903). But it so happens that I get 1 as both 3m and 6m total. Do let me know if there's a fix for this. Regards. – IndigoChild Jun 27 '19 at 17:19
  • The logic for how to pick which variables will not work if the variables are not defined in chronological order. Look at the values of the macro variables and the _CONTENT dataset to see what is happening. Perhaps add `keep=M_20:` datastep option to the input to `proc contents` to only list the variables that match your pattern. But for a more robust solution you really should be keeping the dates in data and not in variable names, then the logic is much easier. – Tom Jun 27 '19 at 17:25
  • Thanks Tom, I'll try editing it. 1 last question. In a basic datastep, if I run a scan function which is beyond the range of the length of the string, is there a way for me to kill the process there? Say if I have a word "coding" and I run Scan("coding",10) - it will be blank. But I don't want it to execute anything at all. Possible? – IndigoChild Jun 27 '19 at 18:55
  • 1
    Questions about SCAN() and %SCAN() are in some ways totally different. In normal SAS code you can test if SCAN() returns only spaces and take some action. But in your example you are using %SCAN() to generate SAS code that will then be compiled. If you want to conditionally generate different code based on return from %SCAN() then you need macro logic, which usually means an actual macro. Although using %sysfunc(ifc(....)) can be used in some cases in open code. – Tom Jun 27 '19 at 19:09
0

Consider reshaping your data from wide to long with proc transpose and then run multiple correlated aggregate SQL subqueries for 3/6/12-month running sums. This will achieve as you mentioned for desired result:

What I need to do is calculate sums of last 3 months, last 6 months and last 12 months.

Data

data Month_Data;
    infile datalines delimiter=',' DSD; 
    length ID $ 3;
    input ID $ M_201812 M_201901 M_201902 M_201903;
    datalines;
X1, 1, ., ., . 
X2, 1, 1, ., . 
X3, ., 1, 1, . 
X4, ., 1, ., . 
X5, ., 1, ., . 
X6, 1, ., ., . 
X7, 1, 1, ., . 
X8, 1, 1, ., . 
X9, ., ., 1, . 
X10, 1, 1, ., . 
;

Reshape

proc sort data=Month_Data;
    by ID;
run;

proc transpose data=Month_Data
    out=Month_Data_Long;
    by ID;
run;

data Month_Data_Long;
    set Month_Data_Long (rename=(_NAME_ = Month_Year col1=value));
    length MMYY $ 9;
    format Month_Date date9.;

    label Month_Year = "Month Year Original";
    MMYY =  tranwrd(Month_Year, "M_", "") || "01";
    Month_Date = input(MMYY,  yymmdd9.);
run;

Aggregation

proc sql;
    create table Run_Sums as
    select m.ID, 
           m.Month_Date,
           m.Value,
           (select sum(m.Value)
            from Month_Data_Long sub
            where sub.ID = m.ID
             and sub.Month_Date >= intnx('month', m.Month_Date, -3)
             and sub.Month_Date <= m.Month_Date
             and sub.Value ^= .
            ) AS ThreeMonthsSum,

           (select sum(m.Value)
            from Month_Data_Long sub
            where sub.ID = m.ID
             and sub.Month_Date >= intnx('month', m.Month_Date, -6)
             and sub.Month_Date <= m.Month_Date
             and sub.Value ^= .
            ) AS SixMonthsSum,

           (select sum(m.Value)
            from Month_Data_Long sub
            where sub.ID = m.ID
             and sub.Month_Date >= intnx('month', m.Month_Date, -12)
             and sub.Month_Date <= m.Month_Date
             and sub.Value ^= .
            ) AS TwelveMonthsSum

    from Month_Data_Long m;
quit;

Output (no difference in sums due to OP's posted data)

SAS Output

Parfait
  • 104,375
  • 17
  • 94
  • 125