0

I am a beginner with SAS and trying to create a table with code below. Although the code has been running for 3 hours now. The dataset is quite huge (150000 rows). Although, when I insert a different date it runs in 45 mins. The date I have inserted is valid under date_key. Any suggestions on why this may be/what I can do? Thanks in advance

proc sql;
create table xyz as
select monotonic() as rownum ,*
from x.facility_yz   
where (Fac_Name = 'xyz' and (Ratingx = 'xyz' or Ratingx is null) )
and Date_key = '20000101'
;
quit;

Tried running it again but same problem

Jenny
  • 1
  • 1
  • Why are you multiplying NAME by FAC? Are those really numeric variables? And the result can never by a character string anyway. But mainly why are you using PROC SQL instead of a data step? The MONOTONIC() function is not documented and not supported. Do you really need ALL of the original variables copied? Reducing the amount of data copied will make it run a lot faster. – Tom Nov 15 '22 at 17:58
  • Hi Tom, my bad - made a typo there, it's fac_name. The person who initially wrote this code did using Proc SQL, it seemed to work fine for previous datasets (which were less than 50000 rows) although with this one it's taking ages. I tried using data step and unfortunately still taking 3+ hours. Would need all the variables for further joins/amendment to final dataset... – Jenny Nov 15 '22 at 23:45
  • Run PROC CONTENTS on the input dataset. Perhaps the new dataset has much larger record size. Is it coming from an external databases? I have seen many more datasets defined in such databases where they do not bother to put an upper limit on the length of variable length character strings. SAS might well be converting those to crazy long variables after copying the data. or they might have moved SAS further from the database and so the transfer now takes longer. – Tom Nov 16 '22 at 03:26
  • It is coming from an external database (SQL) which is also running slow when trying to execute a select * query in sql with the above filters (date and name) - at the moment it has only executed 4000 rows in 2 hours (150000 rows in total that it needs to execute) wonder if the problem may just be with the SQL database – Jenny Nov 16 '22 at 12:58

2 Answers2

2

Is your dataset coming from an external database? A SAS dataset of this size should not take nearly this long to query - it should be almost instant. If it is external, you may be able to take advantage of indexing. Try and find out what the database is indexed on and try using that as a first pass. You may consider using a data step instead rather than SQL with the monotonic() function.

For example, assume it is indexed by date:

data xyz1;
    set x.facility_xyz;
    where date_key = '20000101';
run;

Then you can filter this final dataset within SAS itself. 150,000 rows is nothing for a SAS dataset, assuming there aren't hundreds of variables making it large. A SAS dataset this size should run lightning fast when querying.

data xyz2;
    set xyz1;
    where fac_name = 'xyz' AND (Ratingx = 'xyz' or Ratingx = ' ') );

    rownum = _N_;
run;

Or, you could try it all in one pass while still taking advantage of the index:

data xyz;
    set x.facility_xyz;
    where date_key = '20000101';

    if(fac_name = 'xyz' AND (Ratingx = 'xyz' or Ratingx = ' ') );

    rownum+1;
run;

You could also try rearranging your where statement to see if you can take advantage of compound indexing:

data xyz;
    set x.facility_xyz;
    where     date_key = '20000101'
          AND fac_name = 'xyz' 
          AND (Ratingx = 'xyz' or Ratingx = ' ')
    ;

    rownum = _N_;
run;

More importantly, only keep variables that are necessary. If you need all of them then that is okay, but consider using the keep= or drop= dataset options to only pull what you need. This is especially important when talking with an external database.

Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21
  • Thanks for your response - I tried using the above code(s) although still taking a lot of time. The dataset is saved in SQL hence yes probably that's why it's taking ages. Would it be worth just saving the dataset within SAS via import data? Or that wouldn't really make a difference? – Jenny Nov 15 '22 at 23:42
  • I would probably chat with a database admin to see if they can help fix some of the query issues. It sounds like it could be a poorly optimized database, or perhaps a slow connection. The only time I've ever run into major SQL performance issues was when I would push an unsupported function into a database and it would pull the whole thing. Worst case, saving the dataset within SAS would make a big difference. If you can export the data as a csv and import it you'll have a lot better performance when working directly within SAS. – Stu Sztukowski Nov 16 '22 at 00:54
  • There are ways you can diagnose connection issues as well. Check out this paper: https://www.google.com/url?sa=t&source=web&rct=j&url=https://support.sas.com/resources/papers/proceedings13/080-2013.pdf&ved=2ahUKEwiJjPDfvrH7AhWoElkFHTFBBUQQFnoECBUQAQ&usg=AOvVaw0vAgbKMiOLoUqF4PSVtJCG – Stu Sztukowski Nov 16 '22 at 00:54
  • Hi Stu, trying to run the sql query (on sql) that will present me with the output that I am then hoping to save as a csv and import to SAS - although not looking good as only 4000 rows have been executed (what you see on the results tab of query) in the past two hours. Given there are 150000 rows this seems off. May have something to do with the SQL database itself perhaps? I will try and chat with a database admin – Jenny Nov 16 '22 at 12:56
  • The paper shared is really informative! Unfortunately, such a function (for sql server management) I tried to execute (Display Estimated Query Plan) although I get an error saying I dont have the permission to do it with the database i am trying to access – Jenny Nov 16 '22 at 13:01
  • Is it possible the table you're trying to access is a view that is running slowly behind the scenes? – Reeza Nov 17 '22 at 18:19
0

What kind of libname to you use ?

if you are running implicit passthrough using sas function, it would explain why it takes so long.

If you are using sas/connect to xxx module, first add option to understand what is going on : options sastrace=,,,d sastraceloc=saslog;

You should probably use explicit passthrough : using rdbms native language to avoid automatic translation of your code.

SMA
  • 15
  • 5