I have a dataframe with the first three columns 'Person', 'NoShow', and 'date'. I want to create the two additional columns 'prior_noshow' and 'prior_noshow_f', but I cannot figure out how to do this using the lag function.
Person NoShow date prior_noshow prior_noshow_f
1 123 0 2019-01-01 NA NA
2 123 1 2019-04-01 0 0
3 123 0 2020-01-01 1 1
4 123 1 2021-01-01 1 1
5 123 0 2021-04-30 1 2
6 123 0 2022-01-05 1 2
7 334 0 2019-07-07 NA NA
8 334 1 2019-10-11 0 0
9 334 1 2020-07-07 1 1
10 334 0 2021-01-04 1 2
Column 'prior_noshow' should show whether a person missed a previous appointment. So, person 123 had their first noshow on 2019-04-01, and I want every following row for person 123 to have prior_noshow==1.
Column 'prior_noshow_f" should show the frequency of previously missed appointments. So, person 123 should have prior_noshow_f==1 for dates 2020-01-01 and 2021-01-01 and prior_noshow_f==2 for dates 2021-04-30 and 2022-01-05.
Code to create the dataset:
df <- data.frame(Person = c('123','123','123','123','123','123','334','334','334','334'),
NoShow = c(0,1,0,1,0,0,0,1,1,0),
date = c('2019-01-01','2019-04-01','2020-01-01','2021-01-01','2021-04-30','2022-01-05','2019-07-07','2019-10-11','2020-07-07','2021-01-04')
)
I've tried different approaches I saw in previous discussions here regarding the lag-function, but I haven't able to figure this out.
Any help would be greatly appreciated!