2017-01-01
shouldn't be PDT
, the daylight-savings is wrong. This is why if you look at d$dates
you'll see your time zones apparently reversed: R is "fixing the problem" (even if you didn't want it to). One might try using the format=...
argument to as.POSIXct
, but there is not %
-code for time zone on the input, so that doesn't help.
Furthermore, the fact that d$dates
shows time zones is because R thoughtfully (/carelessly?) assumed your local time zone when parsing the times. This can be shown by changing one of your dates to UTC:
d = data.frame( dates = c(as.POSIXct("2017-01-01 PDT"), as.POSIXct("2017-04-02 UTC") ) )
d$dates
# [1] "2017-01-01 PST" "2017-04-02 PDT"
# ^^^ is not UTC
Furthermore, R does not seem to understand "PDT"
as a timezone:
as.POSIXct("2017-01-01", tz = "PDT")
# ... lots of warnings ...
# [1] "2017-01-01 GMT"
But it does accept something similar:
as.POSIXct("2017-01-01", tz = "PST8PDT")
# [1] "2017-01-01 PST"
If all you really want is the literal portion from the original string, then simply d$TZ <- gsub(".* ", "", d$dates)
will give it to you, but if your intention is anything other than cosmetic/printing, these may not all be recognized by R. You may need to translate to something "known".
One way to go is to change the source to use hours-offset instead of timezone (e.g., -0800
instead of PDT
). Doing this, you can parse it with:
as.POSIXct("2017-01-01 -0500", format = "%Y-%m-%d %z")
# [1] "2016-12-31 21:00:00 PST"
as.POSIXct("2017-01-01 -0500", format = "%Y-%m-%d %z", tz = "UTC")
# [1] "2017-01-01 05:00:00 UTC"
(I'm assuming since you are using as.POSIXct
vice as.Date
that you want a date/time stamp, not just a date.)
Another method is to translate the suggested timezone among the list of known timezones. You can find the known timezones through ?timezones
(another related Q/A here).
After a little testing (please, test this further), I came up with this:
converttz <- function(x) {
on <- OlsonNames()
ind <- sapply(gsub(".* ", "", x), function(z) head(grep(z, on), n = 1))
ret <- character(length(x))
ret[lengths(ind) == 0] <- NA
ret[lengths(ind) > 0] <- on[unlist(ind[lengths(ind) > 0])]
ret
}
This works as long as case is correct; that is, "est"
may just as easily match "America/Creston"
, and even if you search only at the beginning or end of the strings, it could still match "Europe/Budapest"
.
From here, something like this works:
dts <- c("2017-01-01 PDT", "2017-04-02 UTC")
d <- data.frame(dates = as.POSIXct(dts), stringsAsFactors = FALSE)
d$TZ <- converttz(dts)
str(d)
# 'data.frame': 2 obs. of 2 variables:
# $ dates: POSIXct, format: "2017-01-01" "2017-04-02"
# $ TZ : chr "PST8PDT" "Etc/UTC"
Okay, so "Etc/UTC"
isn't very sightly. "UTC"
does exist in there, but it is the second one matched so is filtered out with head
. You can try alternative methods for finding a closer match (perhaps looking for exact matches first, then beginning/end).