0

I have the following functions. CreateChronVector does exactly what it implies. The resulting vector is in hourly intervals by default. The RoundHour function rounds up a chron vector to the hour.

CreateChronVector  <-  function(chronFrom, chronTo, frequency = "hourly")  {
  library(chron)
  datesFrom <- dates(chronFrom)
  timesFrom <- (chronFrom - dates(chronFrom))
  datesTo <- dates(chronTo)
  timesTo <- (chronTo - dates(chronTo))
  if ((timesFrom != 0 || timesTo != 0) && frequency == "daily") {
      print("Error: The indicated dates have hour components while the given frequency is daily.")
  }
  else {
      if (timesTo == 0 && frequency == "hourly") {
          timesTo <- 23/24
      }
      if (frequency == "hourly") {
          chronFrom <- chron(dates = datesFrom, times = timesFrom, 
              format = c(dates = "m/d/y", times = "h:m:s"))
          chronTo <- chron(dates = datesTo, times = timesTo, 
              format = c(dates = "m/d/y", times = "h:m:s"))
          dateVector <- seq(chronFrom, chronTo, by = 1/24)
      }
      else if (frequency == "daily") {
          dateVector <- seq(datesFrom, datesTo)
      }
      return(dateVector)
  }
}

RoundHour  <- function(x)  {
  res <-  trunc(x,'hours', eps=1e-17)
  res <-  ifelse((x-res) > 0.5/24, res+1/24, res)
  return(as.chron(res))
}

The problem I'm facing is that the intervals are not consistent. As an example, the code below returns two different interval sizes:

unique(diff(CreateChronVector(as.chron('2010-01-01'), as.chron('2010-01-01'))))

Similarly, using my rounding function does not correct the problem:

unique(diff(RoundHour(CreateChronVector(as.chron('2010-01-01'), as.chron('2010-01-01')))))

I'm sure this problem has to do with round-off errors. I have been trying to play with the trunc function and its eps parameter, but no luck.

tshepang
  • 12,111
  • 21
  • 91
  • 136
JAponte
  • 1,508
  • 1
  • 13
  • 21
  • `chron` uses floating point so you can't really expect the intervals to be EXACTLY the same. The difference between the interval lengths will be negligibly small which should be good enough. – G. Grothendieck Feb 25 '13 at 23:32
  • Do you need to use chron? In xts you can do all this very easily. – CHP Feb 26 '13 at 03:56
  • Thanks for the suggestion of xts @geektrader. I just printed the vignette. Looks pretty promising! – JAponte Feb 27 '13 at 16:59
  • @geektrader, I tried xts. It looks like a pretty good time series class but unfortunately it coherts everything into an internal matrix, which doesn't support mixing types for different columns. I have some numeric columns and some categorical variables (characters) for categorization of outliers or different states of the system, etc. – JAponte Mar 01 '13 at 14:16

2 Answers2

0

Taking the point from @G. Grothendieck, you can see what he is talking about if you try this:

hours <- 1:23
dateVector <- sapply(hours , function(x){ chron( dates = "01/01/10" , times = paste0(x,":00:00") ) } )
head( dateVector )
[1] 14610.04166666666606034 14610.08333333333393966 14610.12500000000000000
[4] 14610.16666666666606034 14610.20833333333393966 14610.25000000000000000
unique(diff(dateVector))
[1] 0.04166666666787932626903 0.04166666666606033686548

So you can't really do it because these numbers can't be represented exactly in floating point, but is there a reason this matters to you?

Community
  • 1
  • 1
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • I receive raw data from different sources with date/time values. I want to round up these to the hour to be able to merge them into the same data.frame and perform some time series analysis. When I round up the values, there is the possibility that we end up with missing records, which results in an irregular time series. That's why I first use CreateChronVector, to get a data.frame with all the required date/time values, which I then merge with the final result. But since there are subtle differences in the time value, the merge tends to duplicate time records. – JAponte Feb 26 '13 at 14:55
  • @JAponte What format is your time data in? Is it as above? A numeric type specifying a number and fraction of days passed since an origin date? – Simon O'Hanlon Feb 26 '13 at 18:40
  • it's in chron format. Your example fits well to my situation. I tried rounding the number of digits to 7, but I still have the same issue. `unique(diff(round(dateVector, 7)))` – JAponte Feb 27 '13 at 21:28
0

You can use xts package. Once you have your data in xts object, you can use align.time function to "round up" time index. Almost all the timeseries analysis is very convenient in xts

PS: If you give reproducible example of your data I will update the answer with an example.

tshepang
  • 12,111
  • 21
  • 91
  • 136
CHP
  • 16,981
  • 4
  • 38
  • 57
  • Here is an example of my kind of data. I need to mix categorical and numeric variables in one data structure because I need to keep track of outliers and state of the system: `x<-xts(data.frame(A=1:24,B=letters[1:24]), chron(rep(0, 24), (0:23)/24))` – JAponte Mar 01 '13 at 14:24
  • @Japonte why not convert categorical variables to numerical equivalent to do slicing and dicing of time series and then convert result back to data frame. – CHP Mar 01 '13 at 18:53