1

Goal: To create a variable named 'duration'--to count the number of months the 'previous month's value (0 or 1)' was consistent, (a) only when there are at least 3 consecutive observations in the past for a given month and (b) counting as '0' when the previous month's value was 1.

For instance, a sample structure of the data looks like:

structure(list(group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
2, 2, 2, 2, 2), month = c(2, 4, 5, 6, 7, 10, 11, 12, 13, 14, 
7, 10, 11, 12, 13, 14, 15), value= c(NA, 0, 1, 1, 0, 0, 0, 
0, 0, 0, NA, 1, 1, 0, 0, 0, 1)), class = "data.frame", row.names = c(NA, 
-17L), codepage = 65001L) 

The end result would look like (creating the new variable, 'duration'):

╔═══════╦═══════╦═══════╦════════════╦═══════════════════════════════════════════════════════════════════════╗
║ group ║ month ║ value ║ 'duration' ║                              explanation                              ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   2   ║   na  ║     na     ║                                                                       ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   4   ║   0   ║     na     ║ There is no consecutive month in the past for this month              ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   5   ║   1   ║     na     ║ There is only 1 consecutive month in the past (4) for this month      ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   6   ║   1   ║     na     ║ There are only 2 consecutive months in the past (5, 6) for this month ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   7   ║   0   ║      0     ║ The previous month's value is 1, so the duration becomes 0            ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   10  ║   0   ║     na     ║ There is no consecutive month in the past for this month              ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   11  ║   0   ║     na     ║ There is only 1 consecutive month in the past (10) for this month     ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   12  ║   0   ║     na     ║ There is only 2 consecutive months in the past (10, 11)               ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   13  ║   0   ║      3     ║ The previous month's (month 12) value (0) is consistent for 3 months  ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   1   ║   14  ║   0   ║      4     ║ The previous month's (month 13) value (0) is consistent for 4 months  ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   7   ║   na  ║     na     ║                                                                       ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   10  ║   1   ║     na     ║ There is no consecutive month in the past                             ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   11  ║   1   ║     na     ║ There is only 1 consecutive month in the past                         ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   12  ║   0   ║     na     ║ There is only 2 consecutive months in the past                        ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   13  ║   0   ║      1     ║ The previous month's (month 12) value (0) is consistent for 1 month   ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   14  ║   0   ║      2     ║ The previous month's (month 13) value (0) is consistent for 2 months  ║
╠═══════╬═══════╬═══════╬════════════╬═══════════════════════════════════════════════════════════════════════╣
║   2   ║   15  ║   1   ║      3     ║ The previous month's (month 14) value (0) is consistent for 3 months  ║
╚═══════╩═══════╩═══════╩════════════╩═══════════════════════════════════════════════════════════════════════╝

What I have tried applying was (courtesy of @Will):

setDT(sample)
sample[, month_consecutive := NA]
sample[, value_stable_rows := unlist(lapply(sample[, rle(value), by = group]$length, seq))]
sample[, month_consecutive := unlist(lapply(sample[, rle(diffinv(diff(month) != 1)), by = group]$lengths, seq))]
sample[, value_stable_rows := shift(value_stable_rows, type = "lag"), by = group]
sample[, month_consecutive := shift(month_consecutive, type = "lag"), by = group]

sample[, duration := ifelse(value_stable_rows < month_consecutive , value_stable_rows, month_consecutive)]
sample[, month_lag1 := shift(month, n = 1)]
sample[, month_lag2 := shift(month, n = 2)]
sample[, month_lag3 := shift(month, n = 3)]
sample[!((month - month_lag1 == 1) & (month_lag1 - month_lag2 == 1) & (month_lag2 - month_lag3 == 1)), duration := NA]
sample[, .(group, month, value, duration )]

The above code meets 'Goal(a)' but not 'Goal(b)'. I wanted to ask your advice on what could be added to operationalize counting as '0' when the previous month's value was 1.

user14250906
  • 197
  • 8
  • 1
    @akrun Thank you, I find your code in the answer really efficient. Basically, no matter what 'duration' was (after running the code above), your code changes it to '0' whenever the previous month's value was 1. Awesome! I learned a lot from this. – user14250906 Dec 18 '20 at 23:15

1 Answers1

1

For the second case, specify the logical expression in i i.e. check for rows where there is a non-NA in 'duration' and the previous 'value' (shift) is 1, then assign 'duration' to 0

out <- sample[, .(group, month, value, duration )]
out[!is.na(duration) & shift(value) == 1, duration := 0]

-output

out
#    group month value duration
# 1:     1     2    NA       NA
# 2:     1     4     0       NA
# 3:     1     5     1       NA
# 4:     1     6     1       NA
# 5:     1     7     0        0
# 6:     1    10     0       NA
# 7:     1    11     0       NA
# 8:     1    12     0       NA
# 9:     1    13     0        3
#10:     1    14     0        4
#11:     2     7    NA       NA
#12:     2    10     1       NA
#13:     2    11     1       NA
#14:     2    12     0       NA
#15:     2    13     0        1
#16:     2    14     0        2
#17:     2    15     1        3
akrun
  • 874,273
  • 37
  • 540
  • 662