0

Performing logistic regression with large number of explanatory variables (400 in this example). I can easily reference all 400 variables using the code below in the model statement, but is there also an easy way to generate 1st level interaction terms (i.e. all pairs of two)?

proc logistic data = d1;
    model y = var1-var400 / rsquare;
run;

I've seen code like this:

proc logistic data = d1;
    model y = var1 | var2 | var3... @2 / rsquare;
run;

but this is not realistic for 400 variables.

Any suggestions that provide a better method than doing this the hard way and creating a new dataset that contains all of the interaction terms?

krohn
  • 13
  • 4
  • Is the first option not realistic due to the amount of typing required, or due to performance considerations? – user667489 Dec 18 '18 at 14:53

1 Answers1

0

You can easily generate a macro variable containing a list variables from a dataset using proc sql, e.g.:

proc sql noprint;
  select name into :var_list 
    separated by '|'
  from dictionary.columns 
  where libname = 'SASHELP' 
    and memname = 'CLASS'
  ;
quit;

%put &var_list;

Then you can use that macro variable in your code rather than typing out var1 | ... | var400. Is this a reasonable option for you?

user667489
  • 9,501
  • 2
  • 24
  • 35