0

I want to find the most optimal model specification for a Logit Regression with a dependent variable that is multinomial distributed. Y has three outcomes, and I want to make a forecasting model with 2 variables - a lagged and differenced spot rate Time-series and a time-series of the estimated realized Volatility.

My initial thought was that I create a loop that goes through each specification, and outputs the AIC value, then I can backtrack and find the most optimal model.

This is working, but there's a hitch. I want to look at the spot rate in the following way (example): Spot_t - Spot_t-n (n could be 21). This opens up for a whole lot specifications. In my trial regression I included 12 variables of each, each lagged by 21 days * number of variable. This gave a good model, but I think I need a better iterative process.

If i limit my model to include 12 variables/lags of each variable, we are talking 24 loops. Within these loops there will be many of the same iterations, which is time-consuming and silly in my opinion. Maybe there is a way to bypass this issue.

I am not used to code in SAS. I have decent experience in VBA.

My code is cropped in below, and if you have any idea how to do this differently I would really appreciate it! Maybe it's possible to do with arrays or something like that - but I am not used to SAS programming, so maybe you could shed some light on how to do all this :)

%macro Selectvariables;
   %let y = 0;
   %let z = 2;
   %do a = 1 %to &z;
      %do b = 1 %to &z;
          %do c = 1 %to &z;
             %do d = 1 %to &z;
                %do e = 1 %to &z;
                   %do f = 1 %to &z;
                      %do g = 1 %to &z;
                         %do h = 1 %to &z;
                            %do i = 1 %to &z;
                               %do j = 1 %to &z;
                                  %do k = 1 %to &z;
                                     %do l = 1 %to &z;
                                        %do m = 1 %to &z;
                                           %do n = 1 %to &z;
                                              %do o = 1 %to &z;
                                                 %do p = 1 %to &z;
                                                    %do q = 1 %to &z;
                                                       %do r = 1 %to &z;
                                                          %do s = 1 %to &z;
                                                             %do t = 1 %to &z;
                                                                %do u = 1 %to &z;
                                                                   %do v = 1 %to &z;
                                                                      %do w = 1 %to &z;
                                                                         %do x = 1 %to &z;
                                                                            %let First_Spot_var = Spotlag_&a;
                                                                            %let Second_Spot_var = Spotlag_&b;
                                                                            %let Third_Spot_var = Spotlag_&c;
                                                                            %let Fourth_Spot_var = Spotlag_&d;
                                                                            %let Fifth_Spot_var = Spotlag_&e;
                                                                            %let Sixth_Spot_var = Spotlag_&f;
                                                                            %let Seventh_Spot_var = Spotlag_&g;
                                                                            %let Eighth_Spot_var = Spotlag_&h;
                                                                            %let Nine_Spot_var = Spotlag_&i;
                                                                            %let Tenth_Spot_var = Spotlag_&j;
                                                                            %let Eleventh_Spot_var = Spotlag_&k;
                                                                            %let Twelveth_Spot_var = Spotlag_&l;
                                                                            %let First_vol_var = vollag_&m;
                                                                            %let Second_vol_var = vollag_&n;
                                                                            %let Third_vol_var = vollag_&o;
                                                                            %let Fourth_vol_var = vollag_&p;
                                                                            %let Fifth_vol_var = vollag_&q;
                                                                            %let Sixth_vol_var = vollag_&r;
                                                                            %let Seventh_vol_var = vollag_&s;
                                                                            %let Eighth_vol_var = vollag_&t;
                                                                            %let Nine_vol_var = vollag_&u;
                                                                            %let Tenth_vol_var = vollag_&v;
                                                                            %let Eleventh_vol_var = vollag_&w;
                                                                            %let Twelveth_vol_var = vollag_&x;
                                                                            %let Name = Model_&y;

                                                                            proc Logistic data=CurrencyData;
                                                                               &Name.: model Y1_Optimal_Strategy_3M = &First_Spot_var &Second_Spot_var &Third_Spot_var &Fourth_Spot_var &Fifth_Spot_var &Sixth_Spot_var &Seventh_Spot_var &Eighth_Spot_var &Nine_Spot_var &Tenth_Spot_var &Eleventh_Spot_var &Twelveth_Spot_var &First_vol_var &Second_vol_var &Third_vol_var &Fourth_vol_var &Fifth_vol_var &Sixth_vol_var &Seventh_vol_var &Eighth_vol_var &Nine_vol_var &Tenth_vol_var &Eleventh_vol_var &Twelveth_vol_var;
                                                                               ods output FitStatistics=AIC_&Name(where=(criterion="AIC"));
                                                                            run;
                                                                            %let y = %Eval(&y+1);
                                                                         %end;
                                                                      %end;
                                                                   %end;
                                                                %end;
                                                             %end;
                                                          %end;
                                                       %end;
                                                    %end;
                                                 %end;
                                              %end;
                                           %end;
                                        %end;
                                     %end;
                                  %end;
                               %end;
                            %end;
                         %end;
                      %end;
                   %end;
                %end;
             %end;
          %end;
       %end;
    %end;

    data AllAIC;
       set AIC_: INDSNAME=modelVars;
       dsname = scan(modelVars, 2);
    run;
    proc sort data=AllAIC out=allAIC_Sorted;
       by InterceptAndCovariates;
    run;
    proc Print; run;
%mend;

Sorry for the crazy wide code. Hope you can help me. Maybe i am overcomplicating the issue. :)

Thanks a lot. Best regards, Christian

EDIT: I have set z = 2 just for testing purposes. Ideally this would be considerably higher.

Cettt
  • 11,460
  • 7
  • 35
  • 58
Christian
  • 1
  • 1

1 Answers1

0

I'm not sure there is a BEST way to do this. This a problems statisticians have come up against for a long time.

You should look through the automated variable selection algorithms available in PROC LOGISTIC.

https://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_logistic_syntax22.htm

If you have it installed and have a multi-core machine with enough RAM, PROC HPLOGISTIC will probably do the selection faster.

https://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_hplogistic_toc.htm

I recommend looking at Cross Validated (StackExchange for Statistics) to research the pros and cons of each selection method.

https://stats.stackexchange.com/

Community
  • 1
  • 1
DomPazz
  • 12,415
  • 17
  • 23
  • So you're suggesting a STEPWISE regression? Maybe then I could find a way to integrate 12 variables, but with different lags. I have a loop already assigning 800 new columns with lags of the variables. – Christian Apr 12 '16 at 15:58