1

I am using a macro to loop through files based on names and extract data which works fine for the majority of the cases, however from time to time I experience

ERROR: BY variables are not properly sorted on data set CQ.CQM_20141113.

where CQM_20141113 is the file I am extracting data from. In fact my macro loops through CQ.CQM_2014: and it works up until 20141113. Because of this single failure the file is then not created.

I am using a data step view to "initialize" the data and then in a further step to call data step view (code sample with shortened where conditions):

%let taq_ds = CQ.CQM_2014:;

data _v_&tables / view=_v_&tables;
     set &taq_ds;
     by sym_root date time_m; *<= added by statement
     format sym_root date time_m;
     where sym_root = &stock;   
run; 

data xtemp2_&stockfiname (keep = sym_root year date iprice);
     retain sym_root year date iprice; 
     set _v_&tables; 
     by sym_root date time_m;

/* some conditions */
run;

When I see the error via the log file and I run the file again, then it works (sometimes I need a few trials).

I was thinking of a proc sort, but how to do that when using data step view? Please note the cqm-files are very large (which could also be the root of the problem).

edit: taq_dsis not a single file but runs through several files whose name start with CQM_2014, i.e. CQM_20140101, CQM_20140102, etc.

eternity1
  • 651
  • 2
  • 15
  • 31
  • 2
    You could index &taq_ds. on sym_root date time_m, or create a sql view with an order by clause. – Richard Apr 29 '18 at 21:59

2 Answers2

4

Based on the code provided, you could replace your first data step view with a SQL one:

proc sql;
create view _v_&tables as
  select * from &taq_ds
  where sym_root = &stock
  order by sym_root, date, time_m;

Alternatively you could prefix your data step view with a similar view. This would enforce the ordering needed for the subsequent by statement.

Allan Bowe
  • 12,306
  • 19
  • 75
  • 124
  • thanks @Allan Bowe: something that came into my mind: is it also somehow possible to integrate a part of code that says "if you occur an order-error (or maybe even any other error), skip the file and continue with the next one? – eternity1 Apr 30 '18 at 06:07
  • apologies, should have added how I define my variables: `%let taq_ds = cq.cqm_2013:;` where I used SAS property to "loop" through all files with the name `cqm_2013mmdd`. What would I need to adjust in the proc sql statement? – eternity1 Apr 30 '18 at 06:15
  • 1
    yes - you can add a condition such as `%if &syscc ge 4 %then %do;` and implement logic to enable processing to continue (eg `options obs=max`) – Allan Bowe Apr 30 '18 at 07:27
  • 2
    ref your second comment - it's not very clear. Perhaps you could make a new question? – Allan Bowe Apr 30 '18 at 07:28
  • thanks @Allan Bowe: I tried to wrap it in the `proc sql` statement, however I have the probmlem that `&taq_ds` is not a single file / table, but it's like a loop running through several tables; so `taq_ds = cq.cqm_2013:;`and there are like cq.cqm_20130101, cq.cqm_20130102 one file for each day; I suspect that this is not possible within `proc sql`. regarding the other question I will create a new one – eternity1 Apr 30 '18 at 18:21
  • new question here: https://stackoverflow.com/questions/50106168/sas-sort-error-by-variable-not-sorted-properly – eternity1 Apr 30 '18 at 18:28
1

Creating an index on taq_ds corresponding to the by group order would also solve this, e.g.:

proc datasets lib=<library containing taq_ds>;
modify taq_ds;
index create index1=(sym_root date time_m);
run;
quit;
user667489
  • 9,501
  • 2
  • 24
  • 35
  • would that work in case my `taq_ds` is not a single file but a "loop" through several files? – eternity1 Apr 30 '18 at 18:28
  • 1
    You could do something equivalent by indexing each of the underlying datasets. Please update your question with further details. – user667489 Apr 30 '18 at 18:31