2

I have 2 datasets that I am working on which have a few overlapping dates. I want to create a flag which will check each 'date_lab' date for each id in TABLE B and cross reference it with the the start and end date ranges in TABLE A. If there is an overlap, the 'flag' will be 1; otherwise 0.

TABLE A

ID HSP_STRT HSP_END
1 01/01/2023 01/05/2023
1 02/01/2023 02/15/2023
1 02/24/2023 03/01/2023
2 01/10/2023 01/15/2023
2 01/31/2023 02/04/2023
2 02/08/2023 02/18/2023
3 01/02/2023 01/08/2023
3 01/22/2023 01/25/2023
3 02/01/2023 02/18/2023
4 01/01/2023 01/06/2023
4 01/10/2023 01/31/2023

TABLE B

ID ITEM DATE_LAB
1 CBC 1/2/2023
1 CBC 1/18/2023
1 CBC 02/05/2023
1 CBC 02/06/2023
2 CBC 01/01/2023
2 CBC 01/10/2023
2 CBC 02/8/2023
2 CBC 02/22/2023
3 CBC 01/06/2023
3 CBC 01/19/2023
3 CBC 02/21/2023
4 CBC 01/11/2023
4 CBC 01/31/2023
4 CBC 02/11/2023
4 CBC 02/22/2023

Im looking for a table with a overlap flag like to look like this:

TABLE C

ID ITEM DATE_LAB FLAG
1 CBC 1/2/2023 1
1 CBC 1/18/2023 0
1 CBC 02/05/2023 1
1 CBC 02/06/2023 1
2 CBC 01/01/2023 0
2 CBC 01/10/2023 1
2 CBC 02/8/2023 1
2 CBC 02/22/2023 0
3 CBC 01/06/2023 1
3 CBC 01/19/2023 0
3 CBC 02/21/2023 0
4 CBC 01/11/2023 1
4 CBC 01/31/2023 1
4 CBC 02/11/2023 0
4 CBC 02/22/2023 0

Ive tried the overlap function but its giving me wonky results. Would love any help. Thank you in advance!!

CODE USED:

proc sql;
create table cx_adlb1 as select distinct a.usubjid, a.date_lab, case when (b.hsp_strt le a.date_lab le b.hsp_end) then '1' else '0' end as overlap from adlb as a left join adho as b on a.usubjid = b.usubjid order by usubjid; quit;

results from sas studio

Sana
  • 23
  • 3

2 Answers2

2

In Proc SQL you can use an existential correlated sub-query condition as your flag computation.

Example:

 proc sql;
   create table want as
   select two.*
        , exists ( /* correlated sub-query involves the outer table two. */
            select * from one 
            where two.id = one.id
              and two.date between one.start and one.end
          ) as flag
   from two;

Your tables as DATA steps with datalines;

data one;
input id start end;
attrib start end format=mmddyy10. informat=mmddyy10.;
datalines;
 1 01/01/2023 01/05/2023
 1 02/01/2023 02/15/2023
 1 02/24/2023 03/01/2023
 2 01/10/2023 01/15/2023
 2 01/31/2023 02/04/2023
 2 02/08/2023 02/18/2023
 3 01/02/2023 01/08/2023
 3 01/22/2023 01/25/2023
 3 02/01/2023 02/18/2023
 4 01/01/2023 01/06/2023
 4 01/10/2023 01/31/2023
run;

data two;
attrib id length=8 lab length=$20 date format=mmddyy10. informat=mmddyy10.;
input id lab date;
datalines;
 1 CBC 1/2/2023
 1 CBC 1/18/2023
 1 CBC 02/05/2023
 1 CBC 02/06/2023
 2 CBC 01/01/2023
 2 CBC 01/10/2023
 2 CBC 02/8/2023
 2 CBC 02/22/2023
 3 CBC 01/06/2023
 3 CBC 01/19/2023
 3 CBC 02/21/2023
 4 CBC 01/11/2023
 4 CBC 01/31/2023
 4 CBC 02/11/2023
 4 CBC 02/22/2023
 ;
Richard
  • 25,390
  • 3
  • 25
  • 38
1

You could do it using a hash table.

data cs_axlb1 ;
  set adlb ;
  /* initialise the hash table */
  if 0 then set adho ;
  if _n_ = 1 then do ;
    declare hash h(dataset:'adho',multidata:'y') ;
    h.definekey('usubjid') ;
    h.definedata('hsp_strt','hsp_end');
    h.definedone() ;
  end ;

  rc = h.reset_dup() ; /* reset the hash pointer when using multidata:y and do_over() */
  flag = 0 ;
  /* do_over will loop over all matching usubjid's in the hash table */
  do while (h.do_over() = 0 and flag = 0) ;
    flag = hsp_strt <= date_lab <= hsp_end ;
  end ;

  drop rc ;
run ;
Chris J
  • 7,549
  • 2
  • 25
  • 25