There are a few ways to approach this, one of which is a data step. One thing SAS does not do well in the data step is look ahead; however, there are some great tricks out there for it. Here is a one-pass solution you can use that only uses the data step and is based on Andrew Gannon's SGF 2019 paper, Calculating Leads and Lags in SAS: One Problem, Many Solutions. This solution can be compacted with arrays but is expanded to make it a bit easier to understand the overall logic.
The basic idea: use some SAS functions to open up the dataset you're reading and fetch future observations within it using direct access. We can pull future values for each observation by simply using _N_+lead
. Since we've opened up the dataset we're currently reading in a set
statement, our obs fetches will always be in-sync for each row.
data want;
set have;
by id callDate;
retain _dsid_;
/* Open up your current dataset */
if(_N_ = 1) then _dsid_ = open('have');
/* Fetch 1 obs ahead of time */
_lead1_rc_ = fetchobs(_dsid_, _N_+1);
/* Fetch values from 1 obs ahead using the dataset we opened */
lead1_date = getvarn(_dsid_, varnum(_dsid_, "callDate"));
lead1_contacted = getvarn(_dsid_, varnum(_dsid_, "Contacted"));
lead1_id = getvarn(_dsid_, varnum(_dsid_, "id"));
/* Fetch2 obs ahead */
_lead2_rc_ = fetchobs(_dsid_, _N_+2);
lead2_date = getvarn(_dsid_, varnum(_dsid_, "callDate"));
lead2_contacted = getvarn(_dsid_, varnum(_dsid_, "Contacted"));
lead2_id = getvarn(_dsid_, varnum(_dsid_, "id"));
/* Set future values to missing if we've crossed to the next ID */
if(lead1_id NE id) then call missing(lead1_date, lead1_contacted);
if(lead2_id NE id) then call missing(lead2_date, lead2_contacted);
/* Keep only records that a successful contact AND are not followed on the next day or the day after next by a successful contact */
if(contacted AND NOT ( (lead1_contacted AND (lead1_date - callDate LE 2) OR (lead2_contacted AND (lead2_date - callDate LE 2) ) ) ) );
format lead1_date lead2_date date9.;
run;
Another solution can use the lead
function within PROC EXPAND
with the where
logic added as an output dataset option:
proc expand data=have
out=want(drop=time
where=(contacted AND NOT ( (lead1_contacted AND (lead1_date - callDate LE 2) OR (lead2_contacted AND (lead2_date - callDate LE 2) ) ) ) )
);
by id;
convert callDate = lead1_Date / transform=(lead 1);
convert callDate = lead2_Date / transform=(lead 2);
convert contacted = lead1_contacted / transform=(lead 1);
convert contacted = lead2_contacted / transform=(lead 2);
run;