2

I’m running a macro with a call execute in a data step. What I want from my data step is the following:

Take a table, add a new column for every existing column (via macro) and finally add a new column that is the sum of two others. I guess it is also possible without a macro, but I want it exactly this way cause I’m new to SAS and want to understand the logic of macros and call execute.

So let’s say I have the following table:

data values;
input a1 a2 b1 b2;
datalines;
1 0 3 10
0 5 6 11
7 8 9 0
;
run;

and this macro:

%macro loop1(myDataset);  
proc contents data=&myDataset. out=Col_Names (keep=Name) noprint;
run;
proc sql noprint; 
select count(Name) into :length from Col_Names;
quit;                                                 
     %do j = 1 %to &length; 
            data &myDataset.;
            set &myDataset.; 
            n&j=0;
            run;                       
     %end;
%mend;

then the following data step creates different output the first three times I run it: (After every run I re-run the original data step with the datalines of course)

data values;
set values;
if _n_=1 then call execute('%loop1(values);');
test=sum(a1,a2);
run;

The first run results in an error:

WARNING: Apparent symbolic reference LENGTH not resolved. ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was: &length ERROR: The %TO value of the %DO J loop is invalid. ERROR: The macro LOOP1 will stop executing.

The second run results in exactly what I want:

The columns a1, a2, b2, b2, test, n1, n2, n3, n4

And from the third run on, the output stays:

The columns a1, a2, b2, b2, test, n1, n2, n3, n4, n5

With the undesired n5 in it.

What should I change to always get the output from the second run?

Zap
  • 147
  • 2
  • 9

3 Answers3

2

When calling macros using call execute it is recommended to wrap them in %nrstr() as per the following usage note:

http://support.sas.com/kb/23/134.html

This prevents premature macro execution - or at least, forces it to wait for any dependent macro variables to be ready, namely the length variable in your INTO: clause.

data values;
  set values;
  if _n_=1 then call execute('%nrstr(%loop1(values);)');
  test=sum(a1,a2);
run;

In order to get the results you desire, you also need to exclude the 'test' variable in your SQL procedure, as follows:

proc sql noprint; 
select count(Name) into :length from Col_Names
  where upcase(name) ne 'TEST';
Allan Bowe
  • 12,306
  • 19
  • 75
  • 124
  • Thanks! But then the result is directly the third one and I desire the second one. – Zap Jan 05 '17 at 10:28
  • 1
    But how come 'test' is even in the proc sql if in the data step it is created after my macro? In this case the macro works now but I have to adapt it every time I have new columns. There is no other way? – Zap Jan 05 '17 at 11:41
  • 1
    call execute does not immediately launch your macro. It (sort of)schedules it for after your data step. _Look to your log._ It sais you first create ´values´ with 4 variables ´a1´ , ´a2´ , ´a3´ , ´a4´ , then it creates it with 5 variables (adding ´test´) and only then it mentions ´%loop1´. _Try This:_ add `drop n1 ,n2 n3 n4;` at the end of your data step. You will see they are in the final result, so they are created later, while running ´%loop1´ after your datastep. – Dirk Horsten Jan 05 '17 at 11:50
  • As @DirkHorsten mentions, the macro is executed AFTER the data step. Another approach would be to modify the `proc sql` to only KEEP the variables you need, eg `where substr(upcase(name),1,1) in ('A','B')` – Allan Bowe Jan 05 '17 at 11:54
  • of course this whole thing is very contrived. I'd recommend avoiding `call execute` wherever possible, also try and avoid overwriting datasets - create new ones each time so you can track the differences! – Allan Bowe Jan 05 '17 at 11:59
  • See my answer in 5 minutes – Dirk Horsten Jan 05 '17 at 12:04
1

Why it fails:

Your code does not at all add variables ´n1´ and so forth to the ´values´ dataset at the moment you think it does. It schedules your macro for after your last data step, and that is not what you want.

Try for instance adding an assignment to your last data step

data values;
    set values;
    *&new_columns;
    if _n_=1 then call execute('%loop1(values);');

    test=sum(a1,a2);
    n2=9874;
run;

and you will see it has no effect because your value for ´n2´ is overwritten when your macro runs;

What you can do:

With select <something> into :<variable> separated by <separator>, you can create a macro variable that contains the assignments.

proc contents data=values noprint out=Col_Names(keep=varnum);
run;
proc sql noprint;
    select 'n'|| strip(put(varnum, 8.)) ||'=0'
    into :new_columns separated by ';'
    from col_names;
quit;

You can use This variable in your data step:

data values;
    set values;
    &new_columns;

    test=sum(a1,a2);
    n2=9874; ** Now this has effect **;
run;
Community
  • 1
  • 1
Dirk Horsten
  • 3,753
  • 4
  • 20
  • 37
  • 1
    For sure it's better to avoid macro, but the OP did say he wanted it "exactly this way" – Allan Bowe Jan 05 '17 at 12:08
  • It can never work "exactly this way", as the `call execute` only runs when the data step is compiled, and at that moment you can not add variables to the data step anymore. (More technically spoken, the program vector is created at that time.) – Dirk Horsten Jan 05 '17 at 12:28
0

The reason it fails is that SAS is running the macro when your data step runs and pushing the code that the macro generates onto the stack. It then runs the actual generated PROC and DATA steps after the data step finishes.

So when your macro runs it generates the PROC SQL step, but the step does not execute yet since your data step is still running. The macro then runs the %DO loop and will either generate an error since the macro variable LENGTH doesn't exist or use the value for the macro variable that existed before your data step started.

To prevent this wrap the macro call in %NRSTR() so that the macro call itself is pushed onto the stack to run after the data step stops.

call execute('%nrstr(%loop1)(values);');

Your actual example does not need CALL EXECUTE at all. Just call the macro after the data step runs.

data values;
  set values;
  test=sum(a1,a2);
run;
%loop1(values);

Or if you don't want to make a flag variable for the new TEST variable then run the macro BEFORE the data step.

Tom
  • 47,574
  • 2
  • 16
  • 29