-1

I have a dataset of individual subject records with birth, diagnosis, and death dates. I would like to turn this into longitudinal data that shows whether or not subjects have been born, have been diagnosed (diagnosis can happen before or after birth), or have died for each week over the study period. Note that birth or diagnosis can happen before the study period.

Study period: 4/1/2021 - 4/31/2021 (weeks start Monday, so the actual study weeks would start on 3/29/2021, 4/5/2021, 4/12/2021, 4/19/2021, and 4/26/2021).

Sample data:

tibble(id=seq(1:4),
       date_birth=c("2/28/2021", "3/2/2021", "4/3/2021", "4/15/2021"),
       date_dx=c("3/4/2021", "4/15/2021", NA, "4/9/2021"),
       date_death=c("4/5/2021", "4/20/2021", NA, "4/23/2021"))

Logic:

If date_birth <= study_week, born = 1, else born = 0

If date_dx <= study_week, dx = 1, else dx = 0

If date_death <= study_week, dead = 1, else dead = 0

Subjects never go from 1 back to 0

Desired output:

tibble(study_week=rep(seq.Date(as.Date("3/29/2021", format="%m/%d/%Y"), by="week", length.out = 5),4),
       id=c(rep(1,5), rep(2,5), rep(3,5), rep(4,5)),
       born=c(1,1,1,1,1,
              1,1,1,1,1,
              0,1,1,1,1,
              0,0,0,1,1),
       
       dx=c(1,1,1,1,1,
            0,0,0,1,1,
            0,0,0,0,0,
            0,0,1,1,1),
       
       dead=c(0,1,1,1,1,
              0,0,0,0,1,
              0,0,0,0,0,
              0,0,0,0,1))

How can I create this output? Thanks!

AMG
  • 33
  • 4
  • I have that your id == 4 was born on 2021-04-15, yet for the study week of 2021-04-12 you have it as already born. Is this a mistake, or am I misunderstanding your result table? – PavoDive May 11 '22 at 18:45
  • This was a mistake, thanks for catching! Edited to correct. – AMG May 11 '22 at 18:47
  • id == 3 was born on 2021-04-03, so it wasn't yet born on study_week 2021-03-29. id 2 wasn't diagnosed until 2021-04-15, so 2021-04-12 can't be dx'd. id 2 died on 2021-04-20, couldn't have been dead on 2021-04-19. id 4 died on 2021-04-23, couldn't have been dead on 2021-04-19. Please fix that, it's very misleading – PavoDive May 11 '22 at 18:54
  • Very sorry - I see not that the table did not reflect the logic. Edited to correct. – AMG May 11 '22 at 19:05

1 Answers1

1

I used a weird approach that works. I just put the study weeks as columns, melted the data.frame and compared the values using ifelse.

library(data.table) #load data.table package

# load your data as data.table
a = data.table(id=seq(1:4),
       date_birth=c("2/28/2021", "3/2/2021", "4/3/2021", "4/15/2021"),
       date_dx=c("3/4/2021", "4/15/2021", NA, "4/9/2021"),
       date_death=c("4/5/2021", "4/20/2021", NA, "4/23/2021"))

# convert dates to date format using {lubridate}
a[, c("date_birth", "date_dx", "date_death") := lapply(.SD, lubridate::mdy), .SDcols = 2:4]
studyWeeks = lubridate::mdy(c("3/29/2021", "4/5/2021", "4/12/2021", "4/19/2021", "4/26/2021"))

# add columns with names same as study weeks, put a dummy (0) value
a[, c(as.character(studyWeeks)) := 0]

# now calculate the result
result = melt(a, 1:4)[, `:=` (born = ifelse(date_birth <= ymd(variable), 1, 0), dx = ifelse(date_dx <= ymd(variable), 1, 0), dead = ifelse(date_death <= ymd(variable), 1, 0))]

# a little cosmetics
result = result[, .(study_week = ymd(variable), id, born, dx, dead)]

What is in the "now calculate the result" line:

  • melt(a, 1:4): convert a wide table into a long one, keeping columns 1 to 4. What follows is standard data.table chaining (][ is roughly the same as |> or %>%)
  • := is the assignment operator. That means we will assign right-hand values to left-hand columns inside the data.table.
  • you probably understand all the ifelse statements. ymd converts to date (from {lubridate}).

I forgot to mention that I kindly suggest you keep NA instead of 0 when the person hasn't been born / diagnosed / dead yet. If you want to replace the NA with 0, then you can do.

PavoDive
  • 6,322
  • 2
  • 29
  • 55
  • Thank you, this worked really well on my full dataset (40 weeks and ~10k subjects). – AMG May 12 '22 at 14:44
  • If the answer helped you, it's considered polite around here to upvote the answer (using the arrows in the left of the answer) and marking it as the accepted answer (using a check mark below the voting). – PavoDive May 12 '22 at 23:27