-1

I have longitudinal patient data in R. I would like to subset patients in the patid column based on this condition: three or more occurrences within one year period (one year= any 12 months period)

Table1:

patid observation_date
1 07/07/2016
1 07/08/2016
1 07/11/2016
1 07/07/2019
2 07/05/2015
2 02/12/2016
3 07/05/2015
3 07/06/2015
3 16/06/2015
4 07/05/2015
4 02/12/2016
4 18/12/2016
4 15/01/2017
abrar_r
  • 55
  • 5

1 Answers1

0
library(tidyverse)
library(lubridate)

df <- read_table(
  "patid    observation_date
1   07/07/2016
1   07/08/2016
1   07/11/2016
1   07/07/2019
2   07/05/2015
2   02/12/2016
3   07/05/2015
3   07/06/2015
3   16/06/2015
4   07/05/2015
4   02/12/2016
4   18/12/2016
4   15/01/2017"
) %>%
  mutate(observation_date = observation_date %>%
           as.Date("%d/%m/%Y"))

df %>%  
  count(patid, year = year(observation_date)) %>% 
  filter(n >= 3)

# A tibble: 2 x 3
  patid  year     n
  <dbl> <dbl> <int>
1     1  2016     3
2     3  2015     3
Chamkrai
  • 5,912
  • 1
  • 4
  • 14
  • Thank you very much for the code. I am just wondering why patid 4 was not included as it has 3 occurrences in one year period (by one year I mean any 12 months period) and patid 4 has occurrences in Dec 2016 and Jan 2017 so should be included but the code seems to calculate it based on a calendar year. Is there a way to change this? – abrar_r Aug 04 '22 at 10:32