i've got time series data on supplies by irregular time intervals, which is a mixture of cumulative and non-cumulative (at irregular intervals). Does anyone know how to accurately combine/standardise the two reported data types, so that both the cumulative and non-cumulative values are determined, pereferably in dplyr/tidyverse ? Help much appreciated. Reproducible code for set up is below.
foo=tibble(
true_val=round(100*(runif(20))))|>
mutate(rn=row_number())|>
select(rn,everything())|>
arrange(rn)|>
mutate(true_cv=cumsum(true_val))|>
mutate(reported_val=replace(true_val,sample(row_number(),size=ceiling(0.3*n()),replace=FALSE),NA))|>
mutate(reported_cv=if_else(is.finite(reported_val),as.numeric(NA),true_cv))
foo
# A tibble: 20 × 5
rn true_val true_cv reported_val reported_cv
<int> <dbl> <dbl> <dbl> <dbl>
1 1 35 35 NA 35
2 2 81 116 81 NA
3 3 1 117 1 NA
4 4 18 135 NA 135
5 5 79 214 NA 214
6 6 88 302 88 NA
7 7 2 304 2 NA
8 8 77 381 77 NA
9 9 11 392 11 NA
10 10 85 477 85 NA
11 11 51 528 NA 528
12 12 39 567 NA 567
column descrpitions:
rn is arbitary time point in order
true_val is non cumulative real value
true_cv is cumulative real value
reported_val is like the data i receive as non-cumulative
reported_cv is like the cumulative data i receive