0

Please consider the following:

In a data.frame similar to the example below, the dates of when a medication was given are noted per patient. The goal is to compute the "time to next treatment" which is defined as the days between the start of one treatment until the start of the next treatment. All other columns in the data.frame (not shown here) hold distinct information that need to be kept.


My approach is as follows:

library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
db <- data.frame(id = c(rep("a", 5), rep("b", 3)),
                 date = c(rep(as.Date("2018-01-01"), 3),
                          rep(as.Date("2018-01-20"), 2),
                          rep(as.Date("2018-01-01"), 3)))
db
#>   id       date
#> 1  a 2018-01-01
#> 2  a 2018-01-01
#> 3  a 2018-01-01
#> 4  a 2018-01-20
#> 5  a 2018-01-20
#> 6  b 2018-01-01
#> 7  b 2018-01-01
#> 8  b 2018-01-01

db %>%
  group_by(id) %>% 
  mutate(time.to.next = as.numeric(lead(date) - date))
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 8 x 3
#> # Groups:   id [2]
#>   id    date       time.to.next
#>   <fct> <date>            <dbl>
#> 1 a     2018-01-01           0.
#> 2 a     2018-01-01           0.
#> 3 a     2018-01-01          19.
#> 4 a     2018-01-20           0.
#> 5 a     2018-01-20          NA 
#> 6 b     2018-01-01           0.
#> 7 b     2018-01-01           0.
#> 8 b     2018-01-01          NA

Created on 2018-08-13 by the reprex package (v0.2.0).


However, what I need is a data.frame (or tibble) looking like this:

#> # A tibble: 8 x 3
#> # Groups:   id [2]
#>   id    date       time.to.next
#>   <fct> <date>            <dbl>
#> 1 a     2018-01-01          19.
#> 2 a     2018-01-01          19.
#> 3 a     2018-01-01          19.
#> 4 a     2018-01-20          NA
#> 5 a     2018-01-20          NA 
#> 6 b     2018-01-01          NA
#> 7 b     2018-01-01          NA
#> 8 b     2018-01-01          NA

Question: how can I achieve that all values per group are the same although the only computed values is the difference between the last observation of one group and the first observation of the subsequent group?

Thank you very much.

Frederick
  • 810
  • 8
  • 28

2 Answers2

3

One option is to firstly drop all duplicates in id and date, calculate the time difference and then join back with db on id and date columns:

db %>% 
    select(id, date) %>%
    distinct() %>% 
    group_by(id) %>% 
    mutate(time.to.next = as.numeric(lead(date) - date)) %>% 
    inner_join(db)

#Joining, by = c("id", "date")
# A tibble: 8 x 3
# Groups:   id [?]
#  id    date       time.to.next
#  <fct> <date>            <dbl>
#1 a     2018-01-01           19
#2 a     2018-01-01           19
#3 a     2018-01-01           19
#4 a     2018-01-20           NA
#5 a     2018-01-20           NA
#6 b     2018-01-01           NA
#7 b     2018-01-01           NA
#8 b     2018-01-01           NA
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • I used this approach with dropping `select()` and instead putting the arguments `id` and `date` into `distinct()` because this will do the same thing. Thank you! – Frederick Aug 14 '18 at 08:35
1

An alternative option would be to calculate the distance between each date and the max(date) of that id and then replace zeros with NA

db <- data.frame(id = c(rep("a", 5), rep("b", 3)),
                 date = c(rep(as.Date("2018-01-01"), 3),
                          rep(as.Date("2018-01-20"), 2),
                          rep(as.Date("2018-01-01"), 3)))

library(dplyr)

db %>%
  group_by(id) %>% 
  mutate(time.to.next = as.numeric(max(date) - date),
         time.to.next = ifelse(time.to.next > 0, time.to.next, NA)) %>%
  ungroup()

# # A tibble: 8 x 3
#   id    date       time.to.next
#   <fct> <date>            <dbl>
# 1 a     2018-01-01           19
# 2 a     2018-01-01           19
# 3 a     2018-01-01           19
# 4 a     2018-01-20           NA
# 5 a     2018-01-20           NA
# 6 b     2018-01-01           NA
# 7 b     2018-01-01           NA
# 8 b     2018-01-01           NA
AntoniosK
  • 15,991
  • 2
  • 19
  • 32