I've been searching for a question related but I have had no luck as of yet. I am looking to transform a long list of independent variables for a regression analysis. A dummy dataset would look like this:
DATA TEST (DROP = i);
DO i = 1 to 4000;
VAR = i + 100000;
output;
end;
run;
PROC TRANSPOSE
DATA = TEST
OUT = TEST_T
(DROP = _NAME_)
PREFIX = X_;
ID VAR;
VAR VAR;
RUN;
DATA TEST_ARRAY;
SET TEST_T;
ARRAY X[*] X_:;
DO J = 1 TO 40;
DO I = 1 TO DIM(X);
X[I] = RANUNI(0)*I;
OUTPUT;
END;
END;
RUN;
In this case the variable names X_i are increasing monotonically, in reality, my variables are actually X_number where the number is a six digit unique identifier. I've been trying to log transform and square all these variables such that I have a new X matrix with the following columns
X_133456 X_SQ_133456 LOG_X_133456
I tried looping a list through all variables like this
PROC CONTENTS
DATA = TEST_ARRAY
OUT = CONTENTS;
RUN;
PROC SQL NOPRINT;
SELECT NAME INTO: REG_FACTORS
SEPARATED BY " "
FROM CONTENTS;
QUIT;
DATA WANT;
SET TEST_ARRAY;
%LET index = 1;
%DO %UNTIL (%SCAN(®_factors.,&index.," ")=);
%LET factors = %SCAN(®_factors.,&index.," ");
LOG_X_&FACTORS. = LOG(X_&FACTORS.);
X_SQ_&FACTORS. = (X_&FACTORS.) ** 2;
%LET index = %EVAL(&Index + 1);
%END;
RUN;
but this blows up my server and I need to find a more efficient way of doing this, thanks in advance
EDIT: for the contributor - I managed to solve at 13:04
%LET input_factors = X_:;
PROC SQL;
SELECT
NAME
, TRANWRD(NAME,%SCAN(&input_factors.,1,'_'),'SQ')
, TRANWRD(NAME,%SCAN(&input_factors.,1,'_'),'LOG')
INTO
:factor_list separated by " "
, :sq_factor_list separated by " "
, :log_factor_list separated by " "
FROM
contents
WHERE
VARNUM < 5
WHERE
NAME LIKE "%SCAN(&input_factors.,1,'_')_"
ORDER BY
INPUT(SCAN(NAME,-1,'_'),8.)
;
QUIT;
%PUT &factor_list.;
%PUT &sq_factor_list.;
%PUT &log_factor_list.;