There are similar problems to mine elsewhere on this site, but none of the answers encompass everything I need to do.
I have a dataframe that I'm trying to change into time varying. Subjects in the study can change from non-treatment to treatment, but not the other way. Subjects have multiple rows of treatment information, and I want to find the first occurrence of treatment, which is simple enough. The snag is that not everyone has an occurrence of the treatment, and hence whenever I run my algorithm for finding the first occurrence these people get deleted. To make my question clearer:
ID treatment start.date stop.date
1 0 01/01/2002 01/02/2002
1 0 01/02/2002 01/03/2002
1 1 01/03/2002 01/04/2002
1 0 01/04/2002 01/05/2002
2 0 01/01/2002 01/02/2002
2 0 01/02/2002 01/03/2002
3 0 01/01/2002 01/02/2002
3 1 01/02/2002 01/03/2002
3 0 01/03/2002 01/04/2002
As you can see, 2
never has the treatment. When I run the following algorithm, 2
is removed.
data$keep <- with(data,
ave(treatment==1, ID ,FUN=function(x) if(1 %in% x) cumsum(x) else 2))
with(data, data[keep==0 | (treatment==1 & keep==1),])
Is there any way to extend this code so it keeps those who don't have a first occurrence and keeps every row up until the first occurrence for those who have it?
To summarise I want my data to look like this:
ID treatment start.date stop.date
1 0 01/01/2002 01/02/2002
1 0 01/02/2002 01/03/2002
1 1 01/03/2002 01/04/2002
2 0 01/01/2002 01/02/2002
2 0 01/02/2002 01/03/2002
3 0 01/01/2002 01/02/2002
3 1 01/02/2002 01/03/2002