0

I'm trying to calculate Sums and Counts using proc sql...Query used to work but started throwing errors as data got huge...

proc sql feedback;
create table saslib.pre as select 
'01MAY2015'd as RUN_DT format=DATE9., 
'01APR2015'D as  MONTHYR format=MONYY7.
, ID, IND, L_ID, CD
, count(distinct case when FIRST_TRAN_DT > . then 
    PROF_ID else . end) as REG_USERS format=COMMA32.
, sum(ACTIVE_30_DAY) as ACTIVE_30_DAY_USERS format=COMMA32.
, sum(ACTIVE_60_DAY) as ACTIVE_60_DAY_USERS format=COMMA32.
, sum(ACTIVE_90_DAY) as ACTIVE_90_DAY_USERS format=COMMA32.
, sum(NEW) as NEW_USERS format=COMMA32.
, sum(YTDNEW) as YTDNEW_USERS format=COMMA32.
, sum(LYNEW) as LYNEW_USERS format=COMMA32.
, sum(ACTIVE_30_60_DAY) as ACTIVE_30_60_DAY_USERS format=COMMA32.
, sum(ACTIVE_60_90_DAY) as ACTIVE_60_90_DAY_USERS format=COMMA32.
, sum(TRAN_CNT) as TRAN_CNT format=COMMA32.
, sum(TRAN_AMT) as TRAN_AMT format=DOLLAR32.
, count(distinct case when FIRST_TRAN_DT_SP > . then PROF_ID else . end) as REG_USERS_SP format=COMMA32.
, sum(ACTIVE_30_DAY_SP) as ACTIVE_30_DAY_USERS_SP format=COMMA32.
, sum(ACTIVE_60_DAY_SP) as ACTIVE_60_DAY_USERS_SP format=COMMA32.
, sum(ACTIVE_90_DAY_SP) as ACTIVE_90_DAY_USERS_SP format=COMMA32.
, sum(NEW_SP) as NEW_USERS_SP format=COMMA32.
, sum(YTDNEW_SP) as YTDNEW_USERS_SP format=COMMA32., sum(LYNEW_SP) as LYNEW_USERS_SP format=COMMA32.
, sum(ACTIVE_30_60_DAY_SP) as ACTIVE_30_60_DAY_USERS_SP format=COMMA32.
, sum(ACTIVE_60_90_DAY_SP) as ACTIVE_60_90_DAY_USERS_SP format=COMMA32.
, sum(TRAN_CNT_SP) as TRAN_CNT_SP format=COMMA32.
, sum(repair_micro) as repair_micro
, sum(repair_amt) as repair_amt
, sum(enrl_cnt) as ENRL_CNT, sum(unenrl_cnt) as UNENRL_CNT
, sum(GROSS_ENRL_FLG) as GROSS_ENRL_CNT
, sum(NET_ENRL_FLG) as NET_ENRL_CNT
, sum(NEW_ENRL_FLG) as NEW_ENRL_CNT 
from snap.snapshot where 
PROF_STATUS = 'A' group by ID, IND, L_ID, CD ;

ERROR: :Unable to insert into temporary index while processing summary functions. NOTE: Compressing data set SASLIB.PRE increased size by 50.00 percent. Compressed is 3 pages; un-compressed would require 2 pages.

I'm not sure how Proc sql works in the background, so how can I fix this issue?

Thanks, Sam.

SAS_learner
  • 521
  • 1
  • 13
  • 30
  • I can't help you with this `proc sql` issue but may I propose using `proc means` or `proc summary` instead? They are generally more efficient than `proc sql` in SAS. – DaBigNikoladze May 06 '15 at 15:22
  • 3
    That sounds like a disk space issue to me. – Joe May 06 '15 at 18:03
  • @DaBigNikoladze `proc means` and `proc summary` are not more efficient than SQL (in SAS). I think you'll find they perform about the same for the majority of common uses. I'd actually say that SQL has the edge unless you remember to explicitly state the `/groupinternal` option on your class statement. Don't get me wrong, I'm a big fan of all 3 proc's, but there's any efficiency differences between them would be pretty situational. When choosing between these 3, I'd make the decision on which proc to use based on readability and maintainability. – Robert Penridge May 07 '15 at 15:22
  • Agree with Joe. Check the location of saslib and make sure that the volume it is on has sufficient free space (and you have access to write to it). To find the location of `SASLIB.PRE` you can run the following code to print the location to your log: `%put %sysfunc(pathname(saslib));`. – Robert Penridge May 07 '15 at 15:23
  • @Joe or Robert Penridge : I thought it is because of space issue too but ***temporary index*** in the error is confusing me if the space issue is on saslib library or some temporary space that Proc SQL use to calculate values before putting the data on to saslib. – SAS_learner May 08 '15 at 00:28
  • @Robert Penridge and Joe : I thought it is because of space issue too but ***temporary index*** in the error is confusing me if the space issue is on saslib library or some temporary space that Proc SQL use to calculate values before putting the data on to saslib – SAS_learner May 08 '15 at 00:29
  • @RobertPenridge thanks for the information. My statement was based on personal experience but the generalization can be wrong. In your experience `SQL` can be more efficient than `summary` even with already sorted data using `by` instead of `class`? Is the `groupinternal` class options so important? [going OT] – DaBigNikoladze May 08 '15 at 09:03
  • @DaBigNikoladze Yes, even with sorted data SQL/means/summary should all perform about the same. When you think about it, each of the procs simply iterates through a dataset and increments counters so there shouldn't be much difference. The `groupinternal` switch is a little dependant on the properties of your dataset and your desired results. If your columns have formats, then it will make a large difference (as summary/means will apply formats prior to grouping when using the `class` statement). The `by` statement does not have the `groupinternal` option so I guess it doesn't format first – Robert Penridge May 08 '15 at 14:05
  • @Sampath_T Perhaps it's failing on writing temporary files to your work lilbrary. You can do a `%put %sysfunc(pathname(work));` to get the location of your work library. Check that location for disk space. Also, try simplifying the query and seeing if it still runs. – Robert Penridge May 08 '15 at 14:08
  • @Sampath_T Also, you have a mistake in either your `select` clause or your `group by` clause. Your group by column names do not match what you are selecting - specifically the `L_ID` column in the select is not contained in the `group by`. Try fixing this and see if that works. – Robert Penridge May 08 '15 at 14:11
  • @RobertPenridge :That was a typo when I'm pasting the code here... I will check disk space on work library and delete some unwanted files.Let's hope that works...Is there any way to change work library for only this program? Thanks. – SAS_learner May 08 '15 at 14:34
  • @Sampath_T AFAIK you can't change it mid-sas session, but you can specify a different path when your SAS job launches by using the `-WORK "your_new_path_here"` parameter from the command line, or editing your SASv9.cfg file. I recommend the command line option. – Robert Penridge May 08 '15 at 14:54
  • @RobertPenridge : I'm planning to use KEEP option on From table so that it reads less columns first. About that work location, thing is this program is automated in ControlM. So, I'm wondering if there a way to specify work location with in the code.Please let me know if there is any option like that.Thanks for your quick responses. – SAS_learner May 08 '15 at 15:22
  • Lets discuss this in chat... http://chat.stackoverflow.com/rooms/info/77332/disussion-between-robert-penridge-and-sampath-t?tab=general – Robert Penridge May 08 '15 at 15:43

0 Answers0