Dataset of blood results:
id result date
A1 80 01/01/2006
A1 70 02/10/2006
A1 61 01/01/2007
A1 30 01/01/2008
A1 28 03/06/2008
B2 40 01/01/2006
B2 30 01/10/2006
B2 25 01/01/2015
B2 10 01/01/2020
G3 28 01/01/2009
G3 27 01/01/2014
G3 25 01/01/2013
G3 24 01/01/2011
G3 22 01/01/2019
U7 20 01/01/2005
U7 19 01/01/2006
U7 18 01/04/2006
U7 18 01/08/2006
I would like to only keep those individuals who have blood results spanning at least a three year period.
I can convert their dates to just years along the line of:
df %>%
# Create a column holding year for each ID
mutate(date = dmy(date)) %>%
mutate(year = year(date)) %>%
# group by ID
group_by(ID, year) %>%
# find max diff
summarise(max_diff = max(year) - min(year))
How would I then continue the pipe to remove those with a max diff <3. The desired output from the above example would be:
id result date
B2 40 2006
B2 30 2006
B2 25 2015
B2 10 2020
G3 28 2009
G3 27 2014
G3 25 2013
G3 24 2011
G3 22 2019
I would then pipe these people into the answer for this question: Predicting when an output might happen in time in R
Many thanks