Subsetting Data Using Proc SQL

Question

I am fairly new to SAS and am working on a sorting exercise to improve my SAS skills, but I seem to keep getting stuck since this dataset has observations with different date ranges.

I was given generated admission and discharge data for patients who visited two different hospitals. The data is sorted on admission date and is contained in one dataset. My goal is to create two datasets from this large dataset. The first dataset should contain the patient ID's for those patients who went to hospital A prior to visiting hospital B. The second data set should contain the patient ID's for those patients who went to hospital B prior to visiting hospital A. A sample of the main dataset looks like this:

ID  Hospital Admission_Date Discharge_Date
1   A        21AUG2018      24AUG2018
1   A        02OCT2019      07OCT2019
1   B        07OCT2019      17OCT2019
2   B        01AUG2020      13AUG2020
2   A        28SEP2020      30SEP2020
3   B        17MAY2019      18MAY2019
3   A        18MAY2019      21MAY2019
3   B        21MAY2019      31MAY2019

The two resulting datasets should only include the patient ID's. For instance, for the datasets where patients went from Hospital A to Hospital B we should have something like this:

ID
1

For the cases where patients went from Hospital B to Hospital A, we should have something like this:

ID
2
3

Any help on this would be greatly appreciated!

score 1 · Answer 1 · answered Oct 19 '21 at 18:13

In SQL, it would look like this:

proc sql;
    create table a_to_b as
        select distinct a.id 
        from have as a
           , have as b
        where  a.id = b.id 
           AND a.hospital = 'A' 
           AND b.hospital = 'B'
           AND b.admission_date GE a.discharge_date
    ;

    create table b_to_a as
        select distinct a.id 
        from have as a
           , have as b
        where  a.id = b.id 
           AND a.hospital = 'A' 
           AND b.hospital = 'B'
           AND a.admission_date GE b.discharge_date
    ;
quit;

The data step version only requires one pass. It assumes that your data is already sorted in the correct order and compares the previous row to the current row. If there is any ID that goes from A to B or B to A, we set a flag for that ID to 1 and stop comparing any further. When we reach the last value of that ID, we output to the appropriate dataset.

data a_to_b
     b_to_a
     ;

    set have;
    by id;
    retain flag_a_to_b flag_b_to_a;

    lag_hospital       = lag(hospital);
    lag_discharge_date = lag(discharge_date);

    if(first.id) then call missing(of lag:, of flag:);

    if(flag_a_to_b < 1) then flag_a_to_b = (    hospital     = 'B' 
                                            AND lag_hospital = 'A' 
                                            AND admission_date GE lag_discharge_date
                                           )
    ;

    if(flag_b_to_a < 1) then flag_b_to_a = (    hospital     = 'A' 
                                            AND lag_hospital = 'B' 
                                            AND admission_date GE lag_discharge_date
                                           )
    ;

    if(last.id AND flag_a_to_b) then output a_to_b;
    if(last.id AND flag_b_to_a) then output b_to_a;

    keep id;
run;

How we arrived at the SQL code

SQL in SAS cannot do lags, so instead we do an inner join on both IDs, but get all combinations of admissions and discharges between hospitals A and B. It looks like this:

From this table, we know that:

hospital_a must be 'A' and hospital_b must be 'B'
The admission date from one hospital must be >= the discharge date from the other hospital

Knowing that, we arrive at the following where clauses:

A to B:

    a.id = b.id 
AND a.hospital = 'A' 
AND b.hospital = 'B'
AND b.admission_date GE a.discharge_date

B to A:

    a.id = b.id 
AND a.hospital = 'A' 
AND b.hospital = 'B'
AND a.admission_date GE b.discharge_date

Thank you for the very clear explanation! This is exactly what I was trying to do and because of you I have learned something new :) — Student, Oct 19 '21 at 19:06
After looking at the output for my sample datasets, I see that some of the ID's are on both datasets. How could I make it so that each unique ID only goes to 1 dataset? — Student, Oct 19 '21 at 19:27
I guess in that case you'd need to identify the rule for how IDs enter the data. For example, if Patient 3 goes from B to A and A to B, which takes precedence? — Stu Sztukowski, Oct 19 '21 at 19:35
I think the rule here would be whatever happens first is the only case to consider regardless if the patient goes from A to B and then again from B to A. So if patient Z goes from A to B and then from B to A then we only care about the initial transition from A to B. Does that make sense? — Student, Oct 19 '21 at 20:03

Subsetting Data Using Proc SQL

1 Answers1