0

i've got time series data on supplies by irregular time intervals, which is a mixture of cumulative and non-cumulative (at irregular intervals). Does anyone know how to accurately combine/standardise the two reported data types, so that both the cumulative and non-cumulative values are determined, pereferably in dplyr/tidyverse ? Help much appreciated. Reproducible code for set up is below.

foo=tibble(
  true_val=round(100*(runif(20))))|>
  mutate(rn=row_number())|>
  select(rn,everything())|>
  arrange(rn)|>
  mutate(true_cv=cumsum(true_val))|>
  mutate(reported_val=replace(true_val,sample(row_number(),size=ceiling(0.3*n()),replace=FALSE),NA))|>
  mutate(reported_cv=if_else(is.finite(reported_val),as.numeric(NA),true_cv))

foo

# A tibble: 20 × 5
      rn true_val true_cv reported_val reported_cv
   <int>    <dbl>   <dbl>        <dbl>       <dbl>
 1     1       35      35           NA          35
 2     2       81     116           81          NA
 3     3        1     117            1          NA
 4     4       18     135           NA         135
 5     5       79     214           NA         214
 6     6       88     302           88          NA
 7     7        2     304            2          NA
 8     8       77     381           77          NA
 9     9       11     392           11          NA
10    10       85     477           85          NA
11    11       51     528           NA         528
12    12       39     567           NA         567

column descrpitions:
rn is arbitary time point in order
true_val is non cumulative real value
true_cv is cumulative real value
reported_val is like the data i receive as non-cumulative
reported_cv is like the cumulative data i receive

Sarah
  • 25
  • 4

0 Answers0