0

I'm struggling with this. My data have start/end points of each active period. I'd like to calculate the number of active days.

My data are like this:

data <- data.frame(
"ID" = c("A","A","B","B","C","C","C"), 
"start" = c("2022-09-01", "2022-09-01", "2022-09-04", "2022-09-05", "2022-09-25", "2022-09-30", "2022-09-30"), 
"end" = c("2022-09-01", "2022-09-02", "2022-09-05", "2022-09-07", "2022-09-27", "2022-09-30", "2022-10-02")
)

I need sums of active days. I had tried as below:

library(tidyverse)
library(lubridate)
data$start <- ymd(data$start) 
data$end <- ymd(data$end)

dt_interval <- data %>% 
  mutate("duration" = 1 + end - start)
dt_interval$duration <- as.numeric(dt_interval$duration)
dt_int_wide <- dt_interval %>% 
  group_by(ID) %>% mutate(num = row_number()) %>% 
  pivot_wider(id_cols = ID, names_from = num, values_from = duration)
dt_int_wide <- column_to_rownames(dt_int_wide, var="ID")
dt_int_wide %>% 
  mutate("active_days" = apply(dt_int_wide, 1, function(x)sum(na.omit(x))))

  1 2  3 active_days
A 1 2 NA           3
B 2 3 NA           5
C 3 1  3           7

I noticed that I have counted the same date multiple times. I want the result as A=2,B=4,C=6 I will appreciate any solution that can count the days without duplication.

Thanks.

KintensT
  • 7
  • 2
  • I found a smart way to solve my task in another thread. https://stackoverflow.com/a/44721725/20375862 Thank all who checked my question! – KintensT Nov 01 '22 at 18:24

1 Answers1

0

Maybe not so pretty, but this gets the job done.

library(magrittr)
library(dplyr)
data %>% 
  split(., .$ID) %>% 
  lapply(\(id) id %>% 
           apply(1, \(x) seq(as.Date(x[2]), as.Date(x[3]), 1))) %>% 
  lapply(\(id) id %>% 
           unlist() %>% 
           unique() %>% 
           length()) %>% 
  bind_rows()
RRDK
  • 86
  • 4