0

I have three columns, one of which contains a statistic, the other a critical value, and the final one the date. I would like to know how to find the start and end date when the threshold of the critical value is exceeded for at least n periods. A simple example of the data is as follows:

Date        Statistic  Critical Value
2003-01-01  1.1        0.80
2003-01-08  1.5        0.90
2003-01-15  2.1        0.91
2003-01-22  0.5        0.95
2003-01-29  1.4        0.98
2003-02-05  1.3        1.00
2003-02-12  0.8        1.10

I would expect the output with a minimum period of 3 to be:

Start      End
2003-01-01 2003-01-15

However if the minimum period was 2 rather than 3 I would expect:

Start      End
2003-01-01 2003-01-15
2003-01-29 2003-02-05
Simon
  • 3
  • 3

1 Answers1

1

A very self explanatory solution with data.table would be: (There might be more compact solutions)

require(data.table)
setDT(DT)
DT[,exceeds:=Statistic>CriticalValue]
DT[,mydiff := c(0,diff(exceeds))]
DT[mydiff<0, mydiff := 0]
DT[,run := cumsum(mydiff)]
DT[c(exceeds),.(start=Date[1], end = Date[.N], length = .N), by=run]

Results in:

   run      start        end length
1:   0 2003-01-01 2003-01-15      3
2:   1 2003-01-29 2003-02-05      2

So if you only want runs with length>2 use

DT_agg <- DT[c(exceeds),.(start=Date[1], end = Date[.N], length = .N), by=run]
DT_agg[length>2]

   run      start        end length
1:   0 2003-01-01 2003-01-15      3
Rentrop
  • 20,979
  • 10
  • 72
  • 100