1

I have a dataset containing columns for the ID of an individual (enrolid), a start number (start), an end number (end). I want to create a person-period level dataset of all continuous periods, where a continuous period defined as instances when the start number for a row is less than or equal to the end number for the previous row + 1. For instance, for enrolid 1, the first continuous period is from 0 to 15 because the second row for this individual starts at 11 which is the first number of the end number for the last row.

This is the data I have:

have <- tibble(
  enrolid = c(1,1,1,1,2,2,2,2),
  start = c(0,11,19,24,2,14,17,37),
  end  = c(10,15,25,29,13,16,35,49)
)

This is the data I want:

want <- tibble(
  enrolid = c(1,1,2,2),
  continuous_cov_start = c(0,19,2,37),
  continuous_cov_end = c(15,29,35,49),
  continuous_cov_sequence = c(1,2,1,2)
)

Thank you! While I used numbers for start and end, I will be adapting this code to use dates. I am trying to generate a dataframe of all the unique periods of continuous insurance coverage.

I was unable to come up with a loop, mutate, or conditional function that would achieve this task.

jpsmith
  • 11,023
  • 5
  • 15
  • 36
Pharmepi
  • 21
  • 2

2 Answers2

3

I think this solution should work on every case, given the sequences are not too long, because i construct individual sequences then verify (with diff) whether there are blocks with difference > 1 between the concatenated sequences.

have <- data.frame(
  enrolid = c(1,1,1,1,2,2,2,2),
  start = c(0,11,19,24,2,14,17,37),
  end  = c(10,15,25,29,13,16,35,49)
)

have |>
  split(have$enrolid) |> 
  lapply(\(x) unique(do.call(c,
    Map(x$start, x$end, f = \(start, end) seq(start, end)))))  |>
  {\(x) Map(x, names(x), f = \(x, id) data.frame(
    enrolid = id,
    continuous_cov_start = x[which(diff(c(0,x))!=1)],
    continuous_cov_end = x[which(diff(c(x,0))!=1)],
    continuous_cov_sequence = seq_along(which(diff(c(0,x))!=1)))) }() |>
  do.call(what = rbind)
#>     enrolid continuous_cov_start continuous_cov_end continuous_cov_sequence
#> 1.1       1                    0                 15                       1
#> 1.2       1                   19                 29                       2
#> 2.1       2                    2                 35                       1
#> 2.2       2                   37                 49                       2
Ric
  • 5,362
  • 1
  • 10
  • 23
  • Thank you very much! This works beautifully. I'm still a beginner R user, so some of the "how it works" is lost on me, but very nice! Much appreciated. – Pharmepi Mar 23 '23 at 23:07
  • I recommend you to read the help of the functions and operators, (in special the examples at the end of the manpages), which you can see using "help" or "?". i.e. `help("|>")`, `?Map` , `?c`, `?"function"`, `?lapply`, `?do.call` an even `?"{"`. This answer has also the advantage of using only base R functions, not libraries, so it is helpful for learning the basics of the language. – Ric Mar 24 '23 at 00:22
1

I wrote this answer using techniques kindly provided by NuclearLemon's post, and these small changes get me the "want" dataframe I was looking for.

want <- have %>%
arrange(enrolid, start) %>%
group_by(enrolid) %>%
  mutate(
    continuous_cov_start = if_else(
      ((start > (lag(end)+1))|row_number()==1), start, NA),
    continuous_cov_sequence = cumsum(!is.na(continuous_cov_start))) %>% 
  group_by(enrolid, continuous_cov_sequence) %>% 
  mutate(continuous_cov_start= if_else(!is.na(continuous_cov_start), continuous_cov_start, first(continuous_cov_start))) %>% 
  filter(row_number()==n()) %>% 
  rename(continuous_cov_end=end) %>% 
  select(-start)

col_order <- c("enrolid", "continuous_cov_sequence", "continuous_cov_start",
               "continuous_cov_end")
want <- want[, col_order]
Pharmepi
  • 21
  • 2