how to extract timezone into column with POSIXct dates

Question

I have these to dates in a dataframe and I would like to add a column that is the timezone

d = data.frame( dates =  c(as.POSIXct("2017-01-01 PDT"), as.POSIXct("2017-04-02 PST") ) )
d$TZ = attr(as.POSIXlt(d$dates), "tzone")  ### this doesn't do anything
d

when I try to add the timezone with : d$TZ = attr(as.POSIXlt(d$dates), "tzone") I get this error:

> d$TZ = attr(as.POSIXlt(d$dates), "tzone")
Error in `$<-.data.frame`(`*tmp*`, "TZ", value = c("", "PST", "PDT")) : 
  replacement has 3 rows, data has 2

I'd like the output to be:

       dates    TZ
1 2017-01-01    PDT
2 2017-04-02     PST

score 2 · Accepted Answer · edited May 23 '17 at 12:25

2017-01-01 shouldn't be PDT, the daylight-savings is wrong. This is why if you look at d$dates you'll see your time zones apparently reversed: R is "fixing the problem" (even if you didn't want it to). One might try using the format=... argument to as.POSIXct, but there is not %-code for time zone on the input, so that doesn't help.

Furthermore, the fact that d$dates shows time zones is because R thoughtfully (/carelessly?) assumed your local time zone when parsing the times. This can be shown by changing one of your dates to UTC:

d = data.frame( dates =  c(as.POSIXct("2017-01-01 PDT"), as.POSIXct("2017-04-02 UTC") ) )
d$dates
# [1] "2017-01-01 PST" "2017-04-02 PDT"
#                                  ^^^ is not UTC

Furthermore, R does not seem to understand "PDT" as a timezone:

as.POSIXct("2017-01-01", tz = "PDT")
# ... lots of warnings ...
# [1] "2017-01-01 GMT"

But it does accept something similar:

as.POSIXct("2017-01-01", tz = "PST8PDT")
# [1] "2017-01-01 PST"

If all you really want is the literal portion from the original string, then simply d$TZ <- gsub(".* ", "", d$dates) will give it to you, but if your intention is anything other than cosmetic/printing, these may not all be recognized by R. You may need to translate to something "known".

One way to go is to change the source to use hours-offset instead of timezone (e.g., -0800 instead of PDT). Doing this, you can parse it with:

as.POSIXct("2017-01-01 -0500", format = "%Y-%m-%d %z")
# [1] "2016-12-31 21:00:00 PST"
as.POSIXct("2017-01-01 -0500", format = "%Y-%m-%d %z", tz = "UTC")
# [1] "2017-01-01 05:00:00 UTC"

(I'm assuming since you are using as.POSIXct vice as.Date that you want a date/time stamp, not just a date.)

Another method is to translate the suggested timezone among the list of known timezones. You can find the known timezones through ?timezones (another related Q/A here).

After a little testing (please, test this further), I came up with this:

converttz <- function(x) {
  on <- OlsonNames()
  ind <- sapply(gsub(".* ", "", x), function(z) head(grep(z, on), n = 1))
  ret <- character(length(x))
  ret[lengths(ind) == 0] <- NA
  ret[lengths(ind) > 0] <- on[unlist(ind[lengths(ind) > 0])]
  ret
}

This works as long as case is correct; that is, "est" may just as easily match "America/Creston", and even if you search only at the beginning or end of the strings, it could still match "Europe/Budapest".

From here, something like this works:

dts <- c("2017-01-01 PDT", "2017-04-02 UTC")
d <- data.frame(dates = as.POSIXct(dts), stringsAsFactors = FALSE)
d$TZ <- converttz(dts)
str(d)
# 'data.frame': 2 obs. of  2 variables:
#  $ dates: POSIXct, format: "2017-01-01" "2017-04-02"
#  $ TZ   : chr  "PST8PDT" "Etc/UTC"

Okay, so "Etc/UTC" isn't very sightly. "UTC" does exist in there, but it is the second one matched so is filtered out with head. You can try alternative methods for finding a closer match (perhaps looking for exact matches first, then beginning/end).

how to extract timezone into column with POSIXct dates

1 Answers1