How to convert my data to counting process format with start stop times for interval truncation in R?

Question

I would like to model a recurrent event with subjects that move in and out of risk over the course of the observation period of the study.

I have data on the out-of-risk periods (start and end dates) where the subject cannot experience the event.

I would appreciate any help on how to convert my data to this counting process format with start stop times that reflect both event occurrence and interval truncation in R. I can convert the data to counting process format with event occurrence, but do not know how to partition my start stop times to reflect unobserved periods (other than manually creating the data set which I would very much like to avoid).

This is a very simplified example on my input data structure in wide format:

View Input Data Structure

This is what I want to achieve:

id t0 t1 outcome
 1  0 36       0
 2  0  5       1
 2  5  15      1
 2 15  36      0
 3  0   9      0
 3 11  20      1
 3 20  36      0

In my illustration, Subject 1 never experiences the event at get right-censored at 36 months. Subject 2 experiences the event twice and stays in the risk period throughout the observation period. Subject 3 experiences the event once and exits the risk period at 9 months and re-enters the risk period at 11 months.

Other useful info about my study:

Subjects have a common start time of 0 months.
Subjects are right-censored at 36 months if no event is experienced.
Subjects are observed for 3 years.
Subjects can move in and out of risk for varying amounts of time and frequency during the 3 year observation period.

Thank you!

@vaettchen thanks for the edit! May I ask how you achieved that? Long time lurker, first time poster. Would like to upskill :) — shitshimugi, Dec 20 '18 at 01:12
The table you've shared is the desired output, right? Can you also share the corresponding input data? I think I understand what you're trying to do but it will be easier to help if the input data structure is known. — Callum Webb, Dec 20 '18 at 04:20
Indenting by four spaces produces code formatting. You had extra newlines in your original post which I also removed; if you prepare text outside SO's editor, better make sure that you use a text editor (geany, kwrite, notepad++ or so). — vaettchen, Dec 20 '18 at 06:52
@CallumWebb I've edited my post to include a picture of how my input data are structured. — shitshimugi, Dec 20 '18 at 08:40
Great, I think I have some ideas that will help, but where does the 8 in the third line of your output come from? I would've thought the output for subject two would be intervals that look like (0, 5), (5, 15), (15, 36). — Callum Webb, Dec 20 '18 at 21:26
@CallumWebb Yes, you are right. Edited the post accordingly. — shitshimugi, Dec 21 '18 at 06:28

score 0 · Answer 1 · answered Dec 21 '18 at 09:55

I may be missing some corner cases, and there's probably a more elegant solution, but this appears to work.

I suggest running the first two lines of the main logic, then the first three, four, etc. and inspect the output at each stage to build up an understanding of what each step is doing.

library(tidyr)
library(dplyr)

subjects <- data.frame(
  id = 1:3,
  event = c(0, 1, 1),
  time_to_event_1 = c(NA, 5, 20),
  time_to_event_2 = c(NA, 15, NA),
  time_to_risk_out_start_1 = c(NA, NA, 9),
  time_to_risk_out_end_1 = c(NA, NA, 11),
  time_to_risk_out_start_2 = NA,
  time_to_risk_out_end_2 = NA
)

subjects %>%
  mutate(start = 0,
         end = 36) %>%
  select(-event) %>%
  gather(event, t0, -id) %>%
  group_by(id) %>%
  arrange(id, t0) %>%
  filter(!is.na(t0)) %>%
  mutate(t1 = lead(t0)) %>%
  filter(!is.na(t1),
         !grepl("time_to_risk_out_start", event)) %>%
  mutate(outcome = lead(grepl("time_to_event", event), default = 0)) %>%
  select(id, t0, t1, outcome) %>%
  ungroup()

Also for future reference it's better to share your data using dput(subjects) to make it easier for people to assist - in this case it was pretty easy to reproduce :)

How to convert my data to counting process format with start stop times for interval truncation in R?

1 Answers1