recode and add prefix to sas variables

Question

Lets's say I have a bunch of variables named the same way and I'd like to recode them and add a prefix to each (the variables are all numeric).

In Stata I would do something like (let's say the variables start with eq)

foreach var of varlist eq* {
    recode var (1/4=1) (else=0), pre(r_)
}

How can I do this in SAS? I'd like to use the %DO macros, but I'm not familiar with them (I want to avoid SQL). I'd appreciate if you could include comments explaining each step!

Can you show before and after data with variable names? Are you changing the variable names? Or making NEW variables? Are you changing the variable values? If so how? For example what does (1/4=1) mean ? It looks like an obviously false boolean expression. — Tom, Dec 16 '16 at 17:27
I'd like to create new variables, using the original variable names but with a prefix (r_). The expression (1/4=1) means values {1,2,3,4} should be recoded into 1. The variables are scalar [1,2,3,4,5], and I'd like to recode 4 and 5 into 1 and 1,2,3 into 0. — vashts85, Dec 16 '16 at 17:39
Why are you recoding the variables? Why not just attach a format to the variables? — Tom, Dec 20 '16 at 18:04
I don't know what that means, but I want to calculate means over them, and it's often simpler for me to just generate recoded variables (this is what I do in R, Stata and SPSS). — vashts85, Dec 20 '16 at 18:06

Tom · Accepted Answer · 2016-12-16T18:21:46.200

2

SAS syntax for this would be easier if your variables are named using numeric suffix. That is, if you had ten variables with names of eq1, eq2, .... , eq10, then you could just use variable lists to define both sets of variables.

There are a number of ways to translate your recode logic. If we assume you have clean variables then we can just use a boolean expression to generate a 0/1 result. So if 4 and 5 map to 1 and the rest map to 0 you could use x in (4,5) or x > 3 as the boolean expresson.

data want;
  set have;
  array old eq1-eq10 ;
  array new r_eq1-r_eq10 ;
  do i=1 to dim(old);
    new(i) = old(i) in (4,5);
  end;
run;

If you have missing values or other complications you might want to use IF/THEN logic or a SELECT statement or you could define a format you could use to convert the values.

If your list of names is more random then you might need to use some code generation, such as macro code, to generate the new variable names.

Here is one method that use the eq: variable list syntax in SAS that is similar to the syntax of your variable selection before. Use PROC TRANSPOSE on an empty (obs=0) version of your source dataset to get a dataset with the variable names that match your name pattern.

proc transpose data=have(obs=0) out=names;
  var eq: ;
run;

Then generate two macro variables with the list of old and new names.

proc sql noprint ;
  select _name_
       , cats('r_',_name_)
    into :old_list separated by ' '
       , :new_list separated by ' '
  from names
  ;
quit;

You can then use the two macro variables in your ARRAY statements.

  array old &old_list ;
  array new &new_list ;

edited Dec 16 '16 at 18:21

answered Dec 16 '16 at 18:13

Tom

47,574
2
16
29

What is the & for used there in the last code chunk? – vashts85 Dec 20 '16 at 17:32
That is a how you reference the macro variables generated by the PROC SQL query. The `&` is the trigger to the macro processor to treat what follows as the name of a macro variable. The macro variable's value will be substituted for the reference and the resulting code is passed to SAS as if you had typed it into your source program. – Tom Dec 20 '16 at 17:37
Thank you, I used the last 3 code chunks to do the operations I needed and add stuff within the array. It seems crazy that it takes this much code in SAS to clone a bunch of variables and add a prefix to them -- that is what I wanted to do, so perhaps my explanation was not clear? – vashts85 Dec 20 '16 at 17:43
Can you explain the SQL chunk a bit? I'm not familiar with that proc in SAS. – vashts85 Dec 20 '16 at 17:43
PROC SQL just allows you to use SQL code. The INTO clause is what allows it to generate macro variable values. You could have also just used a data step to write the lines of code to a file and used %INCLUDE. Most likely in SAS what you want to do is just leave your variables alone and just apply a FORMAT to them to to get your recoding done on the fly at the point of analysis. – Tom Dec 20 '16 at 18:02
Could you write a solution doing this? I'd rather just simple SAS commands that I comprehend (I don't know any SQL). It shouldn't be so hard or take so much code to clone and prefix a bunch of variables.... – vashts85 Dec 20 '16 at 18:04

Sean · Answer 2 · 2016-12-16T18:04:09.240

You can do this with rename and a dash indicating which variables you want to rename. Note the following only renames the col variables, and not the other one:

data have;                                                                                                                                 
    col1=1;                                                                                                                               
    col2=2;                                                                                                                               
    col3=3;                                                                                                                               
    col5=5; 
    other=99; 
    col12=12; 
run;  


%macro recoder(dsn = , varname = , prefix = );

/*select all variables that include the string "varname"*/
/*(you can change this if you want to be more specific on the conditions that need to be met to be renamed)*/
proc sql noprint;
    select distinct name into: varnames
    separated by " "
    from dictionary.columns where memname = upcase("&dsn.") and index(name, "&varname.") > 0;
quit;

data want;
    set have;

    /*loop through that list of variables to recode*/
    %do i = 1 %to %sysfunc(countw(&varnames.)); 
    %let this_varname = %scan(&varnames., &i.);

        /*create a new variable with desired prefix based on value of old variable*/
        if &this_varname. in (1 2 3) then &prefix.&this_varname. = 0;
            else if &this_varname. in (4 5) then &prefix.&this_varname. = 1;

    %end;
run;

%mend recoder;

%recoder(dsn = have, varname = col, prefix = r_);

While useful, this just renames variables. And the number of variables might change per dataset, so I'd like to be able to use some sort of wildcard. — vashts85, Dec 16 '16 at 17:41
Sorry, what you meant by "recode" originally wasn't clear. See the update! — Sean, Dec 16 '16 at 17:58
It's two steps, both of which are required to do what you're asking for, so no, you can't meaningfully shorten it. — Sean, Dec 16 '16 at 18:10

score 1 · Answer 3 · answered Dec 16 '16 at 18:29

1

PROC TRANSPOSE will give you good flexibility with regards to the way your variables are named.

proc transpose data=have(obs=0) out=vars;
   var col1-numeric-col12;
   copy col1;
   run;
proc transpose data=vars out=revars(drop=_:) prefix=RE_;
   id _name_;
   run;
data recode;
   set have;
   if 0 then set revars;
   array c[*] col1-numeric-col12;
   array r[*] re_:;
   call missing(of r[*]);
   do _n_ = 1 to dim(c);
      if      c[_n_] in(1 2 3) then r[_n_] = 0;
      else if c[_n_] in(4 5)   then r[_n_] = 1;
      else                          r[_n_] = c[_n_];
      end;
   run;
proc print;
   run;

answered Dec 16 '16 at 18:29

data _null_

8,534
12
14

What does the `copy col1;` statement do? Is it needed in this case? – Tom Dec 17 '16 at 00:08
@Tom COL1 is the variable that is transposed in the second transpose. I probably should have used the VAR statement to make it obvious, but the default is to transpose all \_NUMERIC\_ variables. Interestingly I had to use the ID statement although \_NAME\_ is the default ID variable but when you use PREFIX= ID is not implied. – data _null_ Dec 17 '16 at 02:14
So you don't need it. Without it the first transpose makes a dataset with just 1 column instead of 2 and the second transpose makes a dataset with 0 observations instead of 1. It does not look like the ID statement is needed either. At least with 9.4 rel 3. – Tom Dec 17 '16 at 02:57
@tom You seem to be partially correct. I did not need a variable to transpose but in order to get the new variables names as I wanted RE_oldname, I do need ID \_NAME\_; Other wise they will be named RE_1-RE_6 SYSVLONG4 9.04.01M3P06242015 – data _null_ Dec 18 '16 at 16:56

score 1 · Answer 4 · answered Dec 16 '16 at 21:44

It would be nearly trivial to write a macro to parse almost that exact syntax.

I wouldn't necessarily use this - I like both the transpose and the array methods better, both are more 'SASsy' (think 'pythonic' but for SAS) - but this is more or less exactly what you're doing above.

First set up a dataset:

data class;
  set sashelp.class;
  age_ly = age-1;
  age_ny = age+1;
run;

Then the macro:

%macro do_count(data=, out=, prefix=, condition=, recode=, else=, var_start=);
%local dsid varcount varname rc;          *declare local for safety;

%let dsid = %sysfunc(open(&data.,i));       *open the dataset;


%let varcount = %sysfunc(attrn(&dsid,nvars)); *get the count of variables to access;

  data &out.;                                 *now start the main data step;
    set &data.;                               *set the original data set;
    %do i = 1 %to &varcount;                  *iterate over the variables;
      %let varname= %sysfunc(varname(&dsid.,&i.));   *determine the variable name;
      %if %upcase(%substr(&varname.,1,%length(&var_start.))) = %upcase(&var_start.) %then %do;                   *if it matches your pattern then recode it;
        &prefix.&varname. = ifn(&varname. &condition., &recode., &else.);   *this uses IFN - only recodes numerics.  More complicated code would work if this could be character.;
      %end;
    %end;
    %let rc = %sysfunc(close(&dsid));         *clean up after yourself;
  run;

%mend do_count;

   %do_count(data=class, out=class_r, var_start=age, condition= > 14, recode=1, else=0, prefix=p_);

score 0 · Answer 5 · answered Dec 20 '16 at 23:25

The expression (1/4=1) means values {1,2,3,4} should be recoded into 1.

Perhaps you do not need to make new variables at all? If have variables with values 1,2,3,4,5 and you want to treat them as if they have only two groups you could do it with a format.

First define your grouping using a format.

proc format ;
  value newgrp 1-4='Group 1' 5='Group 2' ;
run;

Then you can just use a FORMAT statement in your analysis step to have SAS treat your five level variable as it if had only two levels.

proc freq ;
  tables eq: ;
  format eq: NEWGRP. ;
run;

recode and add prefix to sas variables

5 Answers5