1

I have a set of time series data with ground surface temperatures measured every 10 minutes over multiple days (actually 2 years of data) from three different locations. What I am interested in calculating is the maximum slope (rate of temperature increase) for any 60 minute interval for each day for each site.

So essentially I would like to work through each day, 10 minutes at a time, with a 60 minute window and calculate the slope for each window, and then determine the maximum slope and when during the day it occurred. I would then like to apply this function to every day in the data set. The date/time is in the following format (%m/%d/%y %H:%M).

I am imagining something using ddply and the zoo package and function rollapply, to do something like this pseudocode

ddply(data, .(location, day), function(d) max(rollapply(slope(d$temp~d$time, data=d)))

Where "time" is the time within each day (every 10 min) and "day" is simply the date so the function can be applied across all dates. Obviously, "slope" is not an R function and would have to be written to calculate actual slopes.

Does anyone have more experience with zoo and rollapply or can think of another way to solve this problem?

I've included some sample data here from a single location (so the location column has been removed) https://gist.github.com/natemiller/42eaf45747f31a6ccf9a

Thanks for any assistance, Nate

EDIT: I have since used a combination of geektrader's Joshua Ulrich's answers from below, and used basic algebra to convert the values back to units of ºC per hour

    CperH<-dat$Temp-(dat$Temp/(1+dat$ROC))

Works well.

Nate Miller
  • 386
  • 5
  • 19
  • The slope can change across the hour window right (because you have 3 readings)? So do you want the average slope of each window and then determine when the maximum slope occurred during the day? – Simon O'Hanlon Mar 21 '13 at 15:57
  • Hi Simon. Yes I would like the average slope for each window and then to determine which window had the greatest slope during the day. It will be a maximum rate of change in 60 minutes, but it can be any consecutive 60 minutes within the same day. Hope this clarifies. – Nate Miller Mar 21 '13 at 16:14

1 Answers1

3

You can use xts timeseries package which is very good for timeseries analysis. Combined with TTR package, you can get what you want quite easily.

require(xts)
require(TTR)
dat <- read.csv("https://gist.github.com/natemiller/42eaf45747f31a6ccf9a/raw/916443cfb353d82e8af6cdebdd80b2e956317b24/sampleTempData.csv")

dat.xts <- .xts(x = dat$Temp, index = as.POSIXct(strptime(dat$Date, format = "%m/%d/%y %H:%M")))
names(dat.xts) <- "Temp"
head(dat.xts)
##                     Temp
## 2011-04-11 03:48:00  9.5
## 2011-04-11 03:58:00  9.5
## 2011-04-11 04:08:00  9.5
## 2011-04-11 04:18:00  9.5
## 2011-04-11 04:28:00  9.5
## 2011-04-11 04:38:00  9.5


dat.xts$ROC <- ROC(dat.xts, n = 6)
head(dat.xts, 10)
##                     Temp ROC
## 2011-04-11 03:48:00  9.5  NA
## 2011-04-11 03:58:00  9.5  NA
## 2011-04-11 04:08:00  9.5  NA
## 2011-04-11 04:18:00  9.5  NA
## 2011-04-11 04:28:00  9.5  NA
## 2011-04-11 04:38:00  9.5  NA
## 2011-04-11 04:48:00  9.5   0
## 2011-04-11 04:58:00  9.5   0
## 2011-04-11 05:08:00  9.5   0
## 2011-04-11 05:18:00  9.5   0

dat.xts[which.max(dat.xts$ROC), ]
##                     Temp       ROC
## 2011-04-12 09:48:00 14.5 0.5340825


# If you want to do analysis on per day basis.
dat.xts <- .xts(x = dat$Temp, index = as.POSIXct(strptime(dat$Date, format = "%m/%d/%y %H:%M")))
names(dat.xts) <- "Temp"
head(dat.xts)
##                     Temp
## 2011-04-11 03:48:00  9.5
## 2011-04-11 03:58:00  9.5
## 2011-04-11 04:08:00  9.5
## 2011-04-11 04:18:00  9.5
## 2011-04-11 04:28:00  9.5
## 2011-04-11 04:38:00  9.5


ll <- split.xts(dat.xts, f = "days")


ll <- lapply(ll, FUN = function(x) {
    x$ROC <- ROC(x, 6)
    return(x)
})

max.ll <- lapply(ll, function(x) x[which.max(x$ROC), ])

max.ll
## [[1]]
##                     Temp       ROC
## 2011-04-11 13:38:00 20.5 0.4946962
## 
## [[2]]
##                     Temp       ROC
## 2011-04-12 09:48:00 14.5 0.5340825
## 
## [[3]]
##                     Temp       ROC
## 2011-04-13 10:18:00 15.5 0.4382549
## 
## [[4]]
##                     Temp       ROC
## 2011-04-14 10:38:00 14.5 0.3715636
CHP
  • 16,981
  • 4
  • 38
  • 57
  • +1. But I think you need to reshape the data to get the max slope for *each day* if I understand the OP correctly. – Simon O'Hanlon Mar 21 '13 at 16:19
  • @geektrader, great thanks! I'm not familiar with xts or TTR and the ROC function. I'll dig into this more, but you can probably answer quickly, what does the date/time value in the final output correspond to. The beginning of the period with the greatest ROC? the middle? the end? I knew that there would a package to make this relatively easy and someone out there who would know it well enough to help. Much appreciated! – Nate Miller Mar 21 '13 at 16:53
  • @user1982099 the date corresponds to end date. See content of `ll` in example above. `TTR` is mainly technical analysis library for financial timeseries' but meaning of ROC (rate of change) is same irrespective of domain :) – CHP Mar 21 '13 at 16:57
  • @user1982099: You can do all of it in one line (though it's a bit long): `x <- do.call(rbind, lapply(split(dat.xts,"days"), function(x) { x$Slope <- ROC(x$Temp,6); x[which.max(x$Slope),] }))` – Joshua Ulrich Mar 21 '13 at 17:07
  • @JoshuaUlrich I initially wrote post with 1 line.. then decided to break it for benefit of those unfamiliar with `xts`. – CHP Mar 21 '13 at 17:09
  • @geektrader is it possible to extract the date and ROC data from the max.ll list? Say if I wanted to generate a table or dataframe with date and max ROC or later plot date vs ROC. Right now it seems that I can only access "Temp" and "ROC" and if I try using unlist or converting to a dataframe I lose the date. Thanks – Nate Miller Mar 21 '13 at 22:15
  • Try Joshuaulrich's solution – CHP Mar 22 '13 at 00:47
  • Final question. I've tried to figure this out myself, but its not clear to me how the ROC is calculated or what units it is in. As a slope I would expect it to be in degreesC/hour, but examining the contents of ll above, it is not clear how the ROC is calculated from the temperature data. In the example data above, if we look at ll for day 1, at time 8:38 the ROC is 0.0487 and I believe that is calculated from Temp values of 10,10,10,10,10,10.5. I'm not sure how the ROC is calculated from this. Sorry for the long Q/comment. Thank you. – Nate Miller Mar 22 '13 at 15:51
  • @user1982099 Please update the question for extended questions. It's difficult to discuss all these things in comments. As for `ROC` it's just (Temp[n]/Temp[n-6]-1). You can just type `ROC` in R console and hit enter to see the body of the function – CHP Mar 25 '13 at 03:45