My task is to count the length of periods from given start/end date that were extracted from the large dataset.
Here is sample data.
library(tidyverse)
data <- tibble(ID = rep.int(c(1, 2), times = c(3, 2)),
start = ymd(c("2022-03-03", "2022-03-03", "2022-03-04", "2022-03-20", "2022-03-22")),
end = ymd(c("2022-03-03", "2022-03-04", "2022-03-07", "2022-03-22", "2022-03-23")))
data
# A tibble: 5 × 3
ID start end
<dbl> <date> <date>
1 1 2022-03-03 2022-03-03
2 1 2022-03-03 2022-03-04
3 1 2022-03-04 2022-03-07
4 2 2022-03-20 2022-03-22
5 2 2022-03-22 2022-03-23
I've figured out this with the method introduced here.
data2 <- data %>%
rowwise() %>%
do(tibble(ID = .$ID,
Date = seq(.$start, .$end, by = 1))) %>%
distinct() %>%
ungroup() %>%
count(ID)
data2
# A tibble: 2 × 2
ID n
<dbl> <int>
1 1 5
2 2 4
However, occasionally, all the observations in the extracted start/end columns are NA.
Then the method above stops at the function seq() because no data is there.
like
na_data <- tibble(ID = rep.int(c(1, 2), times = c(3, 2)),
start = ymd(NA),
end = ymd(NA))
na_data
A tibble: 5 × 3
ID start end
<dbl> <date> <date>
1 1 NA NA
2 1 NA NA
3 1 NA NA
4 2 NA NA
5 2 NA NA
na_data %>%
rowwise() %>%
do(tibble(ID = .$ID,
Date = seq(.$start, .$end, by = 1))) %>%
distinct() %>%
ungroup() %>%
count(ID)
*Error in seq.int(0, to0 - from, by) : 'to' must be a finite number*
It is difficult for me to check if all the data in selected columns are NA beforehand, because I have a lot of this kind of process to run simultaneously with the data from the same dataset.
To run the process, I usually select entire scripts in Rstudio with [ctrl + A] and then start. But the error message interrupts in the middle of my tasks.
Does Anyone have a solution to achieve this process with a whole NA data, or to avoid interruption by the error message and proceed to the next code?
Thank you.