0

I am trying to sort data from two tables that pairs the closes date from table 1 with the closest date from table 2 (if exists). There may be ID-dates from table 1 that do not have a match in table 2 and vice versa. Is not guaranteed to be a 1:1 match either. For for instance my data from t1 would be something like:

ID  time1
A   01/09/2015
A   02/16/2015
B   03/03/2015
C   04/01/2015
C   01/20/2015
C   03/15/2015

t2 similarly:

ID  time2
A   01/29/2015
A   02/19/2015
B   03/06/2015
C   01/27/2015
C   03/18/2015
C   04/04/2015

What I don't want when joining on ID is a combination of all unique dates, rather combinations where the days between are minimum eg.

desired_output:

ID  date_time1   date_time2
A   01/09/2015   01/29/2015
A   02/16/2015   02/19/2015
B   03/03/2015   03/06/2015
C   01/20/2015   01/27/2015
C   03/15/2015   03/18/2015
C   04/01/2015   04/04/2015

I am not sure how to do this, I have tried selecting the max and min but that only gives something like, which is not what I want:

ID  date_time1  date_time2
A   01/09/2015   02/19/2015
B   03/03/2015   03/06/2015
C   01/20/2015   04/04/2015

Any ideas/suggestions for how to do this would be greatly appreciated! Thank you!

Brad
  • 85
  • 12

1 Answers1

1

I used two built-in SAS datasets as examples, but the general idea here is that you want to compare each date within an group in t1 with each date within the same id group in t2, determine the minimum date difference, and keep only that one.

data t1 (keep = id date rename = date = time1);
    set sasuser.admitjune;
    if mod(_n_, 2) = 0 then id = "A";
        else id = "B";
run;

data t2 (keep = id date rename = date = time2);
    set sasuser.admitjune;
    if mod(_n_, 2) = 0 then id = "B";
        else id = "A";
    date = date + 3; /*adjusting the dates so the datasets aren't the same*/
run;

To do this, you sort by ascending difference in dates and nodup to keep only the closest one:

proc sql;
    create table desired_output as select
        a.id, a.time1, b.time2 as nearest_time2,
        abs(b.time2-a.time1) as diff
        from t1 as a
        left join t2 as b
        on a.id = b.id and b.time2 > a.time1
        group by a.id, a.time1
        order by calculated diff;
    proc sort nodupkey; by id time1;
quit;
Sean
  • 1,120
  • 1
  • 8
  • 14
  • You might want to a `and b.time2 > a.time1` to your join condition to ensure the `time2` event is after the `time1` event. – Chris J Jul 26 '16 at 20:25
  • I changed the description up a little because I think it was unclear what I was trying to. I am trying to sort data from two tables that pairs the closes date from table 1 with the closest date from table 2 (if exists). But that is a good addition Chris J. – Brad Jul 26 '16 at 20:30
  • So by closest, you want closest *following* date (i.e., date2 MUST be later than date1)? – Sean Jul 26 '16 at 20:32
  • that is correct. And there may be multiple date ranges per ID, so I am trying to the pair the closest start and end dates for each ID, to create an encounter – Brad Jul 27 '16 at 13:22
  • Also I have updated sample data to hopefully show more clearly what I am trying to do. – Brad Jul 27 '16 at 13:37
  • The solution provided should solve your problem. What goes wrong when you run it? – Sean Jul 28 '16 at 12:51