0

Date parsing bug

I am having trouble with character to date-time conversions and would appreciate help understanding what is going wrong. To do this, I define a very simple data frame with two rows, which holds an ID, a time zone, a date, and a time for each row. I would like to add a column that contains a (say) POSIXct entry for the combined date-time including the correct time zone. (This is a synthetic example but I want to apply this to a much larger data set.)

First we try combining these features into a unified representation of the data, time and time zone using R’s base facilities.

d <- data.frame(id=c(111, 222), 
            tzz=c("Europe/Berlin", "US/Eastern"), 
            d=c("09-Sep-2017", "11-Sep-2017"), 
            t=c("23:42:13", "22:05:17"),
            stringsAsFactors = FALSE)

d$dt <- strptime(paste(d$d, d$t), tz=d$tzz, format="%d-%b-%Y %T")

Error in strptime(paste(d$d, d$t), tz = d$tzz, format = "%d-%b-%Y %T") : 
  invalid 'tz' value

That approach fails, though it’s not clear to my why. For example, I can do the non-vectorized version of this easily. Also, the time zones I am using seem to be part of the officially supported list.

d$tzz %in% OlsonNames()

[1] TRUE TRUE

dt1 <- strptime(paste(d$d[1], d$t[1]), tz=d$tzz[1], format="%d-%b-%Y %T")
print(dt1)
[1] "2017-09-09 23:42:13 CEST"

print(tz(dt1))
[1] "Europe/Berlin"

dt2 <- strptime(paste(d$d[2], d$t[2]), tz=d$tzz[2], format="%d-%b-%Y %T")

print(dt2)
[1] "2017-09-11 22:05:17 EDT"

print(tz(dt2))
[1] "US/Eastern"

Also, Thinking that perhaps my problem was in misunderstanding how to use strptime, I then tried a similar approach with lubridate:

library(lubridate)
d$dt <- dmy_hms(paste(d$d, d$t), tz=d$tzz)

Error in strptime(.enclose(x), .enclose(fmt), tz) : invalid 'tz' value

but got the same error. Again, a non-vector version works fine.

dt1l <- dmy_hms(paste(d$d[1], d$t[1]), tz=d$tzz[1])
print(dt1l)
[1] "2017-09-09 23:42:13 CEST"

print(tz(dt1l))
[1] "Europe/Berlin"

Trying mutate in tidyverse yields the same problem. (Incidentally, CEST is not among the OlsonNames set.)

Help for how to do this correctly, or at least an explanation of how this is going wrong, would be much appreciated.

PSz
  • 1
  • 1

2 Answers2

1

Try computing it row by row like this:

library(dplyr)
d %>%
  rowwise() %>%
  mutate(ct = as.POSIXct(paste(d, t), format = "%d-%b-%Y %H:%M:%S", tz = tzz)) %>%
  ungroup

giving:

# A tibble: 2 x 5
     id tzz           d           t        ct                 
  <dbl> <chr>         <chr>       <chr>    <dttm>             
1  111. Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2  222. US/Eastern    11-Sep-2017 22:05:17 2017-09-11 22:05:17
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thank you. I didn't know about rowwise. Any idea why the vectorized version doesn't work? Also, it appears that the Berlin time got converted to EDT, and no longer remembers its original tz. Is that avoidable? – PSz Apr 13 '18 at 21:58
  • The problem is that the time zone cannot be a vector. If you store them in a list then they can have separate time zones but a POSIXct vector can only have one. – G. Grothendieck Apr 13 '18 at 22:54
0

Similar to Gabor's but with data.table using the fact that the ids are unique:

R> dt <- data.table(d)
R> dt[ , ct := as.POSIXct(paste(d, t), "%d-%b-%Y %H:%M:%S", tz=tzz), by=id][]
    id           tzz           d        t                  ct
1: 111 Europe/Berlin 09-Sep-2017 23:42:13 2017-09-09 17:42:13
2: 222    US/Eastern 11-Sep-2017 22:05:17 2017-09-11 22:05:17
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725