14

I would like to use R for time series analysis. I want to make a time-series model and use functions from the packages timeDate and forecast.

I have intraday data in the CET time zone (15 minutes data, 4 data points per hour). On March 31st daylight savings time is implemented and I am missing 4 data points of the 96 that I usually have. On October 28th I have 4 data points too many as time is switched back.

For my time series model I always need 96 data points, as otherwise the intraday seasonality gets messed up.

Do you have any experiences with this? Do you know an R function or a package that would be of help to automat such data handling - something elegant? Thank you!

Henrik
  • 65,555
  • 14
  • 143
  • 159
Richi W
  • 3,534
  • 4
  • 20
  • 39

3 Answers3

17

I had a similar problem with hydrological data from a sensor. My timestamps were in UTC+1 (CET) and did not switch to daylight saving time (UTC+2, CEST). As I didn't want my data to be one hour off (which would be the case if UTC were used) I took the %z conversion specification of strptime. In ?strptime you'll find:

%z Signed offset in hours and minutes from UTC, so -0800 is 8 hours behind UTC.

For example: In 2012, the switch from Standard Time to DST occured on 2012-03-25, so there is no 02:00 on this day. If you try to convert "2012-03-25 02:00:00" to a POSIXct-Object,

> as.POSIXct("2012-03-25 02:00:00", tz="Europe/Vienna")
[1] "2012-03-25 CET"

you don't get an error or a warning, you just get date without the time (this behavior is documented).

Using format = "%z" gives the desired result:

> as.POSIXct("2012-03-25 02:00:00 +0100", format="%F %T %z", tz="Europe/Vienna")
[1] "2012-03-25 03:00:00 CEST"

In order to facilitate this import, I wrote a small function with appropriate defaults values:

as.POSIXct.no.dst <- function (x, tz = "", format="%Y-%m-%d %H:%M", offset="+0100", ...)
{
  x <- paste(x, offset)
  format <- paste(format, "%z")
  as.POSIXct(x, tz, format=format, ...)
}

> as.POSIXct.no.dst(c("2012-03-25 00:00", "2012-03-25 01:00", "2012-03-25 02:00", "2012-03-25 03:00"))
[1] "2012-03-25 00:00:00 CET"  "2012-03-25 01:00:00 CET"  "2012-03-25 03:00:00 CEST"
[4] "2012-03-25 04:00:00 CEST"
Henrik
  • 65,555
  • 14
  • 143
  • 159
Tobias
  • 422
  • 2
  • 6
  • 1
    So, when you read in data with your function, you'll have the same number of data points for each day, whereas you wouldn't if you used `as.POSIXct`? I'm having trouble seeing how this helps. – GSee Dec 13 '12 at 20:14
  • @GSee, you are absolutely right, my answer doesn't solve the problem of having different number of observations per day. But if @Richard is also interrested in a regular series, it might be helpful. For example: `time <- paste("2012-10-", rep(27:29, each=24), " ", 0:23, ":00", sep="")`. Importing with `diff(as.POSIXct(time))` gives a irregular ts, whereas `diff(as.POSIXct.no.dst(time))` gives a regular one. – Tobias Dec 13 '12 at 22:29
  • In case you don’t need timezones at all, you should consider using the class POSIXlt instead with which you also end up having a regular time series `diff(strptime(time, format="%F %H:%M"))`. I found the arcticle (page 29ff) in R-News 2004/1 helpful. [link](http://www.r-project.org/doc/Rnews/Rnews_2004-1.pdf) – Tobias Dec 14 '12 at 14:10
13

If you don't want daylight saving time, convert to a timezone that doesn't have it (e.g. GMT, UTC).

times <- .POSIXct(times, tz="GMT")
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • This does not help me. I want to look at intraday patterns that are triggered by local time. For example people go to work at 08:00 in the local time (this is CET in Winter and CEST in summer). Doing this I lose and win one hour respectively and I was wondering how people solve this. Thanks for the comment anyways. – Richi W Dec 14 '12 at 09:26
1

Here is getting the daylight savings time offset - e.g. Central Daylight Savings time

> Sys.time()
"2015-08-20 07:10:38 CDT" # I am at America/Chicago daylight time

> as.POSIXct(as.character(Sys.time()), tz="America/Chicago")
"2015-08-20 07:13:12 CDT"

> as.POSIXct(as.character(Sys.time()), tz="UTC") - as.POSIXct(as.character(Sys.time()), tz="America/Chicago")
Time difference of -5 hours

> as.integer(as.POSIXct(as.character(Sys.time()), tz="UTC") - as.POSIXct(as.character(Sys.time()), tz="America/Chicago"))
-5

Some inspiration was from

Converting time zones in R: tips, tricks and pitfalls

Ken Williams
  • 22,756
  • 10
  • 85
  • 147