I am trying to figure out how to create a cumulative or rolling sum in R based on a few conditions.
The data set in question is a few million observations of library loans, and the question is to determine how many copies of a given book/title would be necessary to meet demand.
So for each Title.ID, begin with 1 copy for the first instance (ID.Index). Then for each instance after, determine whether another copy is needed based on whether the REQUEST.DATE is within 16 weeks (112 days) of the previous request.
# A tibble: 15 x 3
# Groups: Title.ID [2]
REQUEST.DATE Title.ID ID.Index
<date> <int> <int>
1 2013-07-09 2 1
2 2013-08-07 2 2
3 2013-08-20 2 3
4 2013-09-08 2 4
5 2013-09-28 2 5
6 2013-12-27 2 6
7 2014-02-10 2 7
8 2014-03-12 2 8
9 2014-03-14 2 9
10 2014-08-27 2 10
11 2014-04-27 6 1
12 2014-08-01 6 2
13 2014-11-13 6 3
14 2015-02-14 6 4
15 2015-05-14 6 5
The tricky part is that determining whether a new copy is needed is based not only on the number of request (ID.Index) and the REQUEST.DATE of some previous loan, but also on the preceding accumulating sum.
For instance, for the third request for title 2 (Title.ID 2, ID.Index 3), there are now two copies, so to determine whether a new copy is needed, you have to see whether the REQUEST.DATE is within 112 days of the first (not second) request (ID.Index 1). By contrast, for the third request for title 6 (Title.ID 6, ID.Index 3), there is only one copy available (since request 2 was not within 112 days), so determining whether a new copy is needed is based on looking back to the REQUEST.DATE of ID.Index 2.
The desired output ("Copies") would take each new request (ID.Index), then look back to the relevant REQUEST.DATE based on the number of available copies, and doing that would mean looking at the accumulating sum for the preceding calculation. (Note: The max number of copies would be 10.)
I've provided the desired output for the sample below ("Copies").
# A tibble: 15 x 4
# Groups: Title.ID [2]
REQUEST.DATE Title.ID ID.Index Copies
<date> <int> <int> <dbl>
1 2013-07-09 2 1 1
2 2013-08-07 2 2 2
3 2013-08-20 2 3 3
4 2013-09-08 2 4 4
5 2013-09-28 2 5 5
6 2013-12-27 2 6 5
7 2014-02-10 2 7 5
8 2014-03-12 2 8 5
9 2014-03-14 2 9 5
10 2014-08-27 2 10 5
11 2014-04-27 6 1 1
12 2014-08-01 6 2 2
13 2014-11-13 6 3 2
14 2015-02-14 6 4 2
15 2015-05-14 6 5 2
>
I recognize that the solution will be way beyond my abilities, so I will be extremely grateful for any solution or advice about how to solve this type of problem in the future.
Thanks a million!
*4/19 update: new examples where new copy may be added after delay, i.e., not in sequence. I've also added columns showing days since a given previous request, which helps checking whether a new copy should be added, based on how many copies there are.
Sample 2: new copy should be added with third request, since it has only been 96 days since last request (and there is only one copy)
REQUEST.NUMBER REQUEST.DATE Title.ID ID.Index Days.Since Days.Since2 Days.Since3 Days.Since4 Days.Since5 Copies
<fct> <date> <int> <int> <drtn> <drtn> <drtn> <drtn> <drtn> <int>
1 BRO-10680332 2013-10-17 6 1 NA days NA days NA days NA days NA days 1
2 PEN-10835735 2014-04-27 6 2 192 days NA days NA days NA days NA days 1
3 PEN-10873506 2014-08-01 6 3 96 days 288 days NA days NA days NA days 1
4 PEN-10951264 2014-11-13 6 4 104 days 200 days 392 days NA days NA days 1
5 PEN-11029526 2015-02-14 6 5 93 days 197 days 293 days 485 days NA days 1
6 PEN-11106581 2015-05-14 6 6 89 days 182 days 286 days 382 days 574 days 1
Sample 3: new copy should be added with last request, since there are two copies, and the oldest request is 45 days.
REQUEST.NUMBER REQUEST.DATE Title.ID ID.Index Days.Since Days.Since2 Days.Since3 Days.Since4 Days.Since5 Copies
<fct> <date> <int> <int> <drtn> <drtn> <drtn> <drtn> <drtn> <int>
1 BRO-10999392 2015-01-20 76 1 NA days NA days NA days NA days NA days 1
2 YAL-11004302 2015-01-22 76 2 2 days NA days NA days NA days NA days 2
3 COR-11108471 2015-05-18 76 3 116 days 118 days NA days NA days NA days 2
4 HVD-11136632 2015-07-27 76 4 70 days 186 days 188 days NA days NA days 2
5 MIT-11164843 2015-09-09 76 5 44 days 114 days 230 days 232 days NA days 2
6 HVD-11166239 2015-09-10 76 6 1 days 45 days 115 days 231 days 233 days 2