I was trying to use proc ds2 to try and get some performance increases over the normal data step by using the multithreaded capability.
fred.testdata is a SPDE dataset containing 5 million observations. My code is below:
proc ds2;
thread home_claims_thread / overwrite = yes;
/*declare char(10) producttype;
declare char(12) wrknat_clmtype;
declare char(7) claimtypedet;
declare char(1) event_flag;*/
/*declare date week_ending having format date9.;*/
method run();
/*declare char(7) _week_ending;*/
set fred.testdata;
if claim = 'X' then claimtypedet= 'ABC';
else if claim = 'Y' then claimtypedet= 'DEF';
/*_week_ending = COMPRESS(exposmth,'M');
week_ending = to_date(substr(_week_ending,1,4) || '-' || substr(_week_ending,5,2) || '-01');*/
end;
endthread;
data home_claims / overwrite = yes;
declare thread home_claims_thread t;
method run();
set from t threads=8;
end;
enddata;
run;
quit;
I didn't include all IF statements and only included a few otherwise it would have taken up a few pages (you should get the idea hopefully). As the code currently is it works quite a fair bit faster than the normal data step however significant performance issues arise when any of the following happens:
- I uncomment any of the declare statements
- I include any numeric variables in fred.testdata (even without performing any calculations on the numeric variables)
My questions are:
- Is there any way to introduce numeric variables into fred.testdata without getting significant slowdowns which make DS2 way slower than the normal data step? (for this small table of 5 million rows including numeric column/s the real time is about 1 min 30 for ds2 and 20 seconds for normal data step). The actual full table is closer to 600 million rows. For example I would like to be able to do that week_ending conversion without it introducing a 5x performance penalty in run times. Run times for ds2 WITHOUT declare statements and numeric variables takes about 7 seconds
- Is there any way to compress the table in ds2 without having to do an additional data step to compress it?
Thank you