I have an sql script on SAS, which defines time overlap. Because table is very big, first I created smaller copy of that table and defined some new variables. to make lookup into table easier.
data data.orders_small(drop=VALID_TO VALID_FROM DATE_OPEN DATE_CLOSE);
set data.orders(keep=ID_ACCOUNT ID_CLIENT VALID_TO VALID_FROM DATE_OPEN DATE_CLOSE);
length VALID_FROM_DAY_S VALID_FROM_DAYM1_S VALID_TO_DAY_S VALID_TO_DAYP1_S 8.;
format VALID_FROM_D_S VALID_FROM_DM1_S VALID_TO_D_S VALID_TO_DP1_S date9.;
VALID_FROM_D_S=intnx('day',VALID_FROM,+0,'S');
VALID_FROM_DM1_S=intnx('day',VALID_FROM,-1,'S');
VALID_TO_D_S=intnx('day',VALID_TO,+0,'S');
VALID_TO_DP1_S=intnx('day',VALID_TO,+1,'S');
run;
/* Checks existence of previous interval with valid_to = to the valid_from of the currently checked interval
where currently checked interval is not the first one */
proc sql noprint;
create table orders_test_1 as
select id_client, id_account, valid_from, valid_to
from data.orders a
where a.VALID_FROM GT a.DATE_OPEN
and not exists( select 1
from data.accounts_test_14_small(&xobstxt.) b
where a.id_account EQ b.id_account
and a.cod_mandant EQ b.cod_mandant
and a.DAT_VALID_FROM EQ b.DAT_VALID_TO_DP1_S );
quit;
/* Checks existence of following interval with valid_from = to the valid_to of the currently checked interval
where currently checked interval is not the last one */
proc sql noprint;
create table orders_test_2 as
select id_client, id_account, valid_from, valid_to
from data.orders a
where VALID_TO NE '31DEC3000'd
and VALID_TO LT DATE_CLOSE
and not exists ( select 1
from data.orders_small b
where a.id_account EQ b.id_account
and a.id_client EQ b.id_client
and a.VALID_TO EQ b.VALID_FROM_DM1_S);
quit;
/* Existence of overlaps */
proc sql;
create table orders_test_3 as
select id_client, id_account
from data.orders a
where exists ( select 1
from data.orders_small b
where a.id_account EQ b.id_account
and a.id_client EQ b.id_client
and ((a.VALID_FROM between b.VALID_FROM_D_S and b.VALID_TO_D_S)
or (a.VALID_TO between b.VALID_FROM_D_S and b.VALID_TO_D_S))
and not (a.VALID_FROM EQ b.VALID_FROM_D_S and a.VALID_TO EQ b.VALID_TO_D_S) );
quit;
Sample of the data:
data data.orders;
Length ID_ACCOUNT 4. ID_CLIENT 4. VALID_FROM 8. VALID_TO 8. DAT_OPEN 8. DAT_CLOSE 8.
;
informat VALID_FROM VALID_TO DATE_OPEN DATE_CLOSE date9.;
format VALID_FROM VALID_TO DATE_OPEN DATE_CLOSE date9.;
input ID_ACCOUNT ID_CLIENT VALID_FROM VALID_TO DATE_OPEN DATE_CLOSE;
datalines;
001 001 01MAR1993 31DEC3000 01MAR1993 31DEC3000
002 002 01MAR1997 15MAY2001 01MAR1997 31DEC3000
002 002 16MAY2001 25JUN2011 01MAR1997 31DEC3000
002 002 24JUN2001 16JUL2012 01MAR1997 31DEC3000
002 002 16MAY2001 16JUL2011 01MAR1997 31DEC3000
;
run;
Running this code take a lot of time, so I thought to use hashing in SAS and make it a bit faster, but couldn't make it work. Any ideas, how could this code be transferred to hash???
Thank you in advance!