2

I have a xts-timeseries temperature data in 5 min resolution.

head(dataset)
Time                Temp
2016-04-26 10:00:00 6.877
2016-04-26 10:05:00 6.877
2016-04-26 10:10:00 6.978
2016-04-26 10:15:00 6.978
2016-04-26 10:20:00 6.978
  1. I want to calculate the longest duration the temperature exceeds a certain threshold. (let's say 20 °C)
  2. I want to calculate all the periods with their duration the temperature exceeds a certain threshold.
  3. I create a data.frame from my xts-data:

    df=data.frame(Time=index(dataset),coredata(dataset))
    head(df)
    Time                  Temp
    1 2016-04-26 10:00:00 6.877
    2 2016-04-26 10:05:00 6.877
    3 2016-04-26 10:10:00 6.978
    4 2016-04-26 10:15:00 6.978
    5 2016-04-26 10:20:00 6.978
    6 2016-04-26 10:25:00 7.079
    
  4. then I create a subset with only the data that exceeds the threshold:

    sub=(subset(x=df,subset = df$Temp>20))
    head(sub)
                Time         Temp
    7514 2016-05-22 12:05:00 20.043
    7515 2016-05-22 12:10:00 20.234
    7516 2016-05-22 12:15:00 20.329
    7517 2016-05-22 12:20:00 20.424
    7518 2016-05-22 12:25:00 20.615
    7519 2016-05-22 12:30:00 20.805
    

    But now im having trouble to calculate the duration of the event the temperature exceeds the threshold. I dont know how to identify a connected period and calculate their duration?

I would be happy if you have a solution for this question (it's my first thread so please excuse minor mistakes) If you need more information on my data, feel free to ask.

tobias_p
  • 23
  • 2
  • `max(rle(dataset$temp> 20)[rle(dataset$temp> 20)$values == T]$lengths)` should work for your first objective in case your data has the same time intervals throughout your dataset (as in your example) – Lennyy Feb 11 '19 at 10:33
  • Can you provide your expected output? E.g. what duration would you expect on the basis of sample data, etc. – arg0naut91 Feb 11 '19 at 10:35
  • @Lenvyy: thanks for your answer, didn't know about the rle() function. Unfortunately, it doesn't work on my data. I get the idea but I am unable to produce results. Data is gap-filled and therefore the same intervals but the function produces NULL. But with the rle() function i came to a solution, by using `r=rle(df$Temp>20) tapply(r$lengths, r$values, max)` @arg0naut: I expect it to be hours-days – tobias_p Feb 11 '19 at 12:40

1 Answers1

1

This may work. I take as example this data:

df <- structure(list(Time = structure(c(1463911500, 1463911800, 1463912100, 
1463912400, 1463912700, 1463913000), class = c("POSIXct", "POSIXt"
), tzone = ""), Temp = c(20.043, 20.234, 6.329, 20.424, 20.615, 
20.805)), row.names = c(NA, -6L), class = "data.frame")

> df
                 Time   Temp
1 2016-05-22 12:05:00 20.043
2 2016-05-22 12:10:00 20.234
3 2016-05-22 12:15:00  6.329
4 2016-05-22 12:20:00 20.424
5 2016-05-22 12:25:00 20.615
6 2016-05-22 12:30:00 20.805

library(dplyr)
df %>% 
  # add id for different periods/events
  mutate(tmp_Temp = Temp > 20, id = rleid(tmp_Temp)) %>% 
  # keep only periods with high temperature
  filter(tmp_Temp) %>%
  # for each period/event, get its duration
  group_by(id) %>%
  summarise(event_duration = difftime(last(Time), first(Time)))


     id event_duration
  <int> <time>        
1     1  5 mins       
2     3 10 mins       
NRLP
  • 568
  • 3
  • 16