3

I hope somebody can help me with the following event-detection problem. The input-data is a timeseries (regular). It contains "time", "waterlevel" and "runoff". The aim is to detect events above threshold and extract the time when it starts, the time when it ends, the duration in minutes and the maximum/sum value during the event. As a definition each event must be cut when a change in date occurs. NAs instead should only lead to a cut of an event if the duration of missing values is longer than an hour.

library(data.table)
library(dplyr)
library(xts)

## data
dWL <- structure(list(Time = structure(c(1463951500, 1463951800, 1463952100, 1463952400, 1463952700, 1463953000, 1463953300, 1463953600, 1463953900, 1463954200, 1463954500, 1463954800, 1463955100, 1463955400, 1463955700, 1463956000),class = c("POSIXct", "POSIXt"), tzone = ""), WL = c(0.2, 2.5, 2.4, 2.1, 0.9, 2.8, 2.9, 1.9, 2.4, NA, 2.3, 2.6, 2.8, 2.1, 2.0, 1.9), Q = c(0.0, 255.5, 232.4, 150.1, 0.0, 345.8, 382.9, 0.0, 214.4, NA, 201.3, 312.6, 362.8, 80.1, 20.0, 0.0)), row.names = c(NA, -16L), class = "data.frame")
## threshold value
vth <-2


na.omit(dWL) %>%  ## ??how to drop NAs only when the NA-duration is longer than an hour??
  mutate(tmp_WL = WL >= vth, id = rleid(tmp_WL)) %>%
  filter(tmp_WL) %>%
  group_by(id) %>% ## ??how to additional seperate events during change-of-date??
  summarise(start_time=first(Time),end_time=last(Time), event_duration = difftime(last(Time), first(Time)), max_Q=max(Q), sum_Q=sum(Q))

I am aware of the package heatwaveR with its very useful exceedance function, although I haven't managed to get it to work for sub-daily time series.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57

1 Answers1

2

Since you tagged this with data.table, let's use that. We can utilize run length encoding with rleid() to keep track of the events. Once we have an ID for each one, we can do a simple group by and do our calculations. At the end we just delete the RLE column by setting it to NULL and use [] to see the result.

library(data.table)
setDT(dWL)[!is.na(WL),event := WL > vth][
  ,RLE := rleidv(event)][
    event == TRUE,.(start = min(Time),
                    end=max(Time),
                    max.WL=max(WL),
                    duration=difftime(max(Time),min(Time)),
                    runoff=sum(Q)),
    by=RLE][,RLE:=NULL][]
#                 start                 end max.WL duration runoff
#1: 2016-05-22 17:16:40 2016-05-22 17:26:40    2.5  10 mins  638.0
#2: 2016-05-22 17:36:40 2016-05-22 17:41:40    2.9   5 mins  728.7
#3: 2016-05-22 17:51:40 2016-05-22 17:51:40    2.4   0 mins  214.4
#4: 2016-05-22 18:01:40 2016-05-22 18:16:40    2.8  15 mins  956.8

Data

dWL <- structure(list(Time = structure(c(1463951500, 1463951800, 1463952100, 1463952400, 1463952700, 1463953000, 1463953300, 1463953600, 1463953900, 1463954200, 1463954500, 1463954800, 1463955100, 1463955400, 1463955700, 1463956000),class = c("POSIXct", "POSIXt"), tzone = ""), WL = c(0.2, 2.5, 2.4, 2.1, 0.9, 2.8, 2.9, 1.9, 2.4, NA, 2.3, 2.6, 2.8, 2.1, 2.0, 1.9), Q = c(0.0, 255.5, 232.4, 150.1, 0.0, 345.8, 382.9, 0.0, 214.4, NA, 201.3, 312.6, 362.8, 80.1, 20.0, 0.0)), row.names = c(NA, -16L), class = "data.frame")
vth <- 2
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Thank you very much. I am going to test this solution soon on my data. I adjusted duration=difftime(max(Time),min(Time))+5 to get from difftime to duration in my 5 minute intervall data. – user2563989 Apr 06 '20 at 14:44
  • If somebody has got an idea for the NA handling ??how to drop NAs only when the NA-duration is longer than an hour?? it would be superb – user2563989 Apr 06 '20 at 14:49
  • Help me understand how not dropping the NAs *unless* they are together for an hour helps? Having no data does not help you assess the water level during that period. I could understand dropping them unless they are longer than an hour. – Ian Campbell Apr 06 '20 at 14:52
  • indeed, this is what i ment. As long as the NA-duration is not longer than an hour they should be omitted. – user2563989 Apr 07 '20 at 19:27