I am working with some discharge data that is recorded at 5-minute intervals, so 12 observations per hour. I want to filter the data to only include so many hours (rows) before and after a change in discharge from positive to negative or negative to positive (there is backflow in this river system and that's why there are negative values). I am trying to piggyback off of the answer provided by AntoniosK in this question (R - Find maximum run of positive / negative values).
So, using something like his example dataframe:
wf = data.frame(discharge = c(100, 125, 128, -25, -30, -27, 104, -23, 100, -50))
I would like to filter the column based on some criteria, such as only retain the observations where there were at least 3 values of the same sign (i.e. positive or negative) followed by a change where the next 3 values after the change (including the value where the signs changed) were all the same sign (i.e. positive or negative). Here, with this small example dataframe, the desired result would be:
[100, 125, 128, -25, -30, -27]
I can do something like the following to create a logical vector to subset the column by to identify where a change is sign occurs based on the previous value, but I haven't been able to determine how to look at so many values before and after the change:
(c(0, diff(sign(wf$discharge))) != 0)
Like I mentioned, I was trying to work with the dplyr
solution given by AntoniosK at a starting point. I'm not sure if this will require using rle
to look at runs, or maybe something with lag
and lead
. It will probably be something simple that I am just overlooking at the moment and drawing a blank on.