4

Is it possible to read (and execute) SAS code that is stored in a string in a SAS Dataset.

For example, the dataset 'CODE' contains a string variable, which contains

"IF TOTAL_SALES GE 20000 AND TYPE IN ('A', 'B', 'C') THEN VAR1 = 'Y' ;"

Can I do something like?

data sales ;
set sales ;
/* run the if statement above */
run ;

Using SAS 9.2

Joe
  • 62,789
  • 6
  • 49
  • 67
Tim
  • 41
  • 1
  • 2
  • Can you give some additional detail about the problem? Why is the code stored within a variable? Did you put it there yourself or was it provided by an external source. There are much nicer ways to solve these kinds of problems if you have control over how the code got there to start with. – Robert Penridge Jul 30 '13 at 16:17
  • I used SAS code to generate the code. Basically, an input file has a detail of rules (e.g. CODE1EQA, SUMINSUREDGT20000, COVEREQC) etc. and I'm trying to convert this into SAS code (i.e. CODE1='A' AND SUMINSURED GT 20000 AND COVER='C') and run this. – Tim Aug 01 '13 at 14:19

3 Answers3

3

There are as many ways to do this as there are ways to construct macro variables. The three most useful typically:

proc sql select into

This allows you to create many lines of code instantly, and is probably the most common tool used for this purpose. It can either directly create lines of code, or more usefully create macro calls. For example, let's say you wanted to run:

data want;
set have;
x = sum(a,b,c);
run;

But x,a,b,c are all defined in another dataset. Further, you have 3 variables like this.

data callset;
input var1 $ var2 $ var3 $ var4 $;
datalines;
x a b c
y d e f
z b e c
;;;;
run;

You could construct it this way:

proc sql;
select cats(x,"=sum(",a,",",b,",",c,");") into :calllist
 separated by ' '
 from callset;
quit;

data want;
set have;
&calllist.
run;

However, it might be easier to construct a macro:

%macro sum(var1,var2,var3,var4);
&var1. = sum(&var2.,&var3.,&var4.);
%mend sum;

Then your PROC SQL is a bit easier (not really in this case, but often this helps readability in more complex code):

proc sql;
select cats('%sum(',catx(',',x,a,b,c),')') into :calllist
 separated by ' '
 from callset;
quit;

Then use it the same way.

The limitations here: You cannot modify the string while you construct it except for what PROC SQL allows you to do (which is powerful, but not datastep code which if you needed to use things like first.var you would have to do prior to the proc sql in a separate step). You have a limitation of around 20k characters in total in the macro variable.

The separated by is important, by the way; without it you can only create a single line of code (only the last one will be put into the macro variable). Even if you don't really want it separated by anything, you still need to separate by ' ' in order to generate the list.

%include file

The include file method is a hybrid of the proc sql method and call execute. It is constructed in a data step, and has no length limitations beyond your OS's file size limits. It is however a bit messier (in that a temporary file is created) and has the normal limitations of include files, such as not containing datalines.

You construct it this way (using the previous datasets):

filename toincl temp; *create temporary fileref;
data _null_;
set callset;
file toincl;
callstr = cats('%sum(',catx(',',x,a,b,c),')');
put callstr $;
run;

data want;
set have;
%include toincl;
run;

It gets around the PROC SQL length limitation, but has the normal limitations of an include file (see the documentation for more information).

call execute

This is used to execute a line of code interactively immediately following the data step. It is convenient as it allows you to construct code on the fly somewhat more flexibly than the other methods, but it has significant timing limitations.

data _null_;
set callset; *this is not the main data set, but the control file with SAS code;
call execute('data want; set have;');
callstr=cats('%sum(',catx(',',x,a,b,c),')');
call execute(callstr);
call execute('run;');
run;

The primary limitation that people usually have trouble with is macro variable timing. When a macro variable is defined in a CALL EXECUTE step, it is not available for use during the same CALL EXECUTE step. So if your code contains code to create and then use a macro variable, it will not behave correctly; you need to use one of the other methods. If you use this method, I highly recommend reading a few papers on CALL EXECUTE such as this one or this one first.

Joe
  • 62,789
  • 6
  • 49
  • 67
2

2 ways to do it:

  1. Use call symput in a data step to put it into a macro variable. You then put the macro variable where you want the code to execute. (take into account the caveats of when SAS interprets and executes what)
  2. Use call execute in a data step. Note that all the code entered into a call execute is accumulated during the data step and then interpreted and executed immediately after the data step.

For call execute, see: http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/viewer.htm#a000543697.htm

mvherweg
  • 1,272
  • 9
  • 11
  • I don't recommend `CALL SYMPUT`; it's unlikely that is better than other potential solutions (or nearly as good), and it's very complicated to get right for newer users. – Joe Jul 30 '13 at 14:26
  • @Joe I agree that it is tricky for newer users. But it offers an advantage over call execute (no need to create the full proc/data step + does not necessarily have to be used immediately after datap step). Personally (tastes differ of course), i find it more convenient than proc sql with select into because you can build up your string more easily in multiple statements before using call symput. If you don't mind creating the full proc/data step in the data step, the include has the advantage of having easy (and permanent) access to the resulting code. So i agree that include mostly 'wins'. – mvherweg Jul 30 '13 at 15:21
  • How often do you really want to execute just _one_ line of code (or macro call or whatever), though? Most of the time you're either running a bunch of calls, one per line, in which case select into or include file is better than a long concatenated retained string, or if you do just need one it is usually easier to call it from a proc sql or call execute. I wouldn't even mention call symput to a novice user or even an intermediate user, as the possibility of issues/confusion is much worse than the utility. – Joe Jul 30 '13 at 15:25
0

You want to use the call symput function prior to create a macro variable containing the SAS code before you read in the sales data. Here is some example code:

* _null_ means there is no output dataset;
data _null_;
    set code;

    * assuming char variable containing the SAS code is called 'code_string';
    * if only certain rows contain the correct code_string values;
    * then you could use an if statement to assign &my_code as desired;
    call symput('my_code', code_string);
run;

data sales;
    set sales;

    * now access the value of the macro variable created with call symputx above;
    &my_code
run;

If the code dataset that has the char variable containing the SAS code string is only a single row (or can be restricted to a single row with a where statement), then you can use proc sql to achieve the same result as the _null_ data step above:

proc sql;
    select code_string into :my_code
    from code
    /* where [put your boolean expression here] */
;
quit;
Daniel Waltrip
  • 2,530
  • 4
  • 25
  • 30
  • Like above, I don't recommend `CALL SYMPUT`; this is a very messy way to convert code to a macro variable (relies on only one row just like your SQL solution). – Joe Jul 30 '13 at 14:27
  • If you read my answer, you see I mention that depending on the nature of the code dataset (which he didn't elaborate on), he could very easily use an "if" or "where" statement to only assign the macro variable when the proper conditions are met. There is nothing about his question that implies he needs to combine values from multiple rows to form the macro variable. It is also somewhat ironic how you bash my solution then also use the `proc sql select into :macro_var` technique in your answer. – Daniel Waltrip Jul 30 '13 at 23:40
  • I only object to `call symput`, which I find is hard to understand for newer programmers and almost always inferior to other solutions. Your `proc sql` solution is perfectly fine, although you don't explain the more commonly useful `separated by` option. You're certainly right that the question does not specify multiple lines of code, but it certainly seems that the more common use case is multiple lines. – Joe Jul 31 '13 at 02:18
  • Also, please don't take my objection as bashing, or in any way intended to imply it was a bad answer; certainly an expert user will find the occasion to use `call symput`. I've simply seen far too many confused users on `SAS-L` to consider it a good suggestion for a newer user. – Joe Jul 31 '13 at 02:23
  • 1
    Fair enough. I suppose I was just feeling a little defensive after someone downvoted me. In my mind, the only real barrier is understanding the macro language. Once that is grasped, the difference between `proc sql select into` and `call symput` is negligible for single value assignment. For someone unfamiliar with `proc sql`, `call symput` might be easier to pick up, but you do lose the easy transition to introducing the concatenation abilities of `select into`. But I digress. Thanks for explaining yourself, cheers! – Daniel Waltrip Jul 31 '13 at 06:00
  • Hi guys, thanks very much for the answers. To clarify, I have multiple lines of code, which I want to apply to each observation in a dataset. Imagine I have a dataset 'CODE' which has 10 lines of code (10 if statements, one on each line). I have dataset 'SALES' with 3000 observations. For each observation, I want to apply these 10 if statements (30000 if statements being run in total). – Tim Aug 01 '13 at 14:28