Logic :
- we detect that items sold passed a have reset if the number of items sold in day
i
is inferior to i-1
. (mind you that this can be correct only if the number of sales per day can reach at maximum 99 units).
- if we still hasn't passed a 100 then we just put the items sold.
- if we passed a hundred we got to add the cumulative sum of the items sold with 100 multiplied by the number of times we passed 100.
Code :
items_sold < lag(items_sold, default=F)
logical as to if a value is inferior to the previous value.
cumsum
to get the cumulative sum of this logical vector (TRUE=1,FALSE=0).
- we group by this new variable (
id
) to ensure that cumsum
will only be applied on elements of the same group.
- apply the formula mentioned above
( id > 0 ) * cumsum(items_sold) + id * 100 + items_sold * ( id == 0 )
: if id > 0
add the cumsum
to 100 multiplied by id
, if not put the values of item_sold
.
- difference can be calculated using the
diff
function.
library(tidyverse)
#### if you're tagged data.frame is wrong
## basically there is no longer a need for the cumsums
## so no need to group thus we drop the id column and directly
df %>%
mutate(items_sold_real= cumsum(items_sold<lag(items_sold, default=F))*100 + items_sold, items_sold_differene_day= c(NA,diff(items_sold_real))) -> df
#### if you want to have a result just like the tagged data.frame
df %>%
mutate(id = cumsum(items_sold < lag(items_sold, default=F))) %>%
group_by(id) %>%
mutate(items_sold_real= ( id > 0 ) * cumsum(items_sold) + id * 100 + items_sold * ( id == 0 ) ) %>%
ungroup %>% select(-id) %>%
mutate(items_sold_differene_day= c(NA, diff(items_sold_real))) -> df
identical(df, tibble(df.result))
#> TRUE
df
# A tibble: 5 x 4
date items_sold items_sold_real items_sold_differene_day
<fct> <dbl> <dbl> <dbl>
1 2020-01-01 10 10 NA
2 2020-01-02 40 40 30
3 2020-01-03 90 90 50
4 2020-01-04 10 110 20
5 2020-01-05 20 130 20
DATA
df <- data.frame(date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"),
items_sold = c(10,40,90, 10, 20))
df.result <- data.frame(date = c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"),
items_sold = c(10,40,90, 10, 20),
items_sold_real = c(10,40,90, 110, 130),
items_sold_differene_day = c(NA, 30, 50, 20, 20)
)