I have data on physician visits for an infection. Individuals can have 1+ visits. Clinicians consider an individual is infected if he has more than one visit within a year. Then, the date of the first visit is considered as infection onset.
For example, ids 1
, 2
, and 3
have the following visits:
Data have;
INPUT row ID date;
CARDS;
1 1 2017-03-22
2 1 2017-04-26
3 1 2018-02-20
4 1 2018-04-07
5 1 2018-04-16
6 2 2014-01-15
7 2 2014-06-23
8 2 2014-07-23
9 2 2015-01-14
10 3 2018-01-22
11 3 2019-05-03
;
run;
Based on the clinical definition of the infection, I want these dates:
row | ID | date |
---|---|---|
1 | 1 | 2017-03-22 |
4 | 1 | 2018-04-07 |
6 | 2 | 2014-01-15 |
The first date of ID=1 is selected because there is 2+ visits within the a year. All visits after the first visit and within a year from the first visit are skipped (row 2 and 3). The second date is row 4, which is more than one year apart from the first visit and there is another visit within a year after.
For ID=2, we select only the first date and skip all the next visits within a year.
For ID=3, we don’t select any date because there is no more than one visit within a year.