1

I have a data.table with UTC date-time-stamp records spanning multiple time zones, and I want to create a new column that will show the date-time-stamp but in the specific time zone of each observation, which is specified by a variable in the same table:

require("lubridate")
require("data.table")
dt <- data.table(A = 1:5, B = rep(ymd_hms("2016-03-24 17:15:12", tz = "UTC"), 5), timezone = c("America/Indiana/Vincennes", "Australia/North", "Pacific/Palau", "Antarctica/Macquarie", "Asia/Nicosia"))

I was trying to accomplish this with the following, but it does not seem to function:

dt[, B_local := with_tz(B, tz = timezone)]
dt
Error in as.POSIXlt.POSIXct(x, tz) : invalid 'tz' value

When I try to add a by specification in the command, it gets closer to the desired output, but is incorrect and I think somehow is due to non-unique pairs of date-time and timezone like this sample table, ie:

dt[, B_local := with_tz(B, tz = timezone), by = .(B, timezone)]
dt
   A                   B                  timezone             B_local
1: 1 2016-03-24 17:15:12 America/Indiana/Vincennes 2016-03-24 19:15:12
2: 2 2016-03-24 17:15:12           Australia/North 2016-03-24 19:15:12
3: 3 2016-03-24 17:15:12             Pacific/Palau 2016-03-24 19:15:12
4: 4 2016-03-24 17:15:12      Antarctica/Macquarie 2016-03-24 19:15:12
5: 5 2016-03-24 17:15:12              Asia/Nicosia 2016-03-24 19:15:12

Even if I change by = .(A) in dt[, B_local := with_tz(B, tz = timezone), by = .(A)] which subsets the table into each row, the output is identical to the above.

NB: I'm more than happy to use something other than lubridate but I'd prefer to work within data.table for efficiencies as I have a large dataset.

daRknight
  • 253
  • 3
  • 17

1 Answers1

4

This stuff is super messy and finicky. I wrote a timezone 'shifter' in package RcppCCTZ as the underlying CCTZ library made that feasible / possible.

One huge caveat: timezones appear only in the formatted output, so I have a solution for you here but the target output is now text. Edited: Which, with one more parse of anytime(), can of course be POSIXct (in your local TZ).

Also note that I used a helper function from anytime to set the time.

Code

suppressMessages({
    library("data.table")
    library("RcppCCTZ")
    library("anytime")
})

dt <- data.table(A = 1:5,
                 B = rep(utctime("2016-03-24 17:15:12", tz="UTC"), 5),
                 timezone = c("America/Indiana/Vincennes", "Australia/North",
                              "Pacific/Palau", "Antarctica/Macquarie",
                              "Asia/Nicosia"))
dt[ , newTime := format(toTz(B, "UTC", timezone), tz=timezone), by=A ]
dt[ , pt := anytime(newTime), by=A ]

Output

R> dt <- data.table(A = 1:5,
+                  B = rep(utctime("2016-03-24 17:15:12", tz="UTC"), 5),
+                  timezone = c("America/Indiana/Vincennes", "Australia/North",
+                               "Pacific/Palau", "Antarctica/Macquarie", 
+                               "Asia/Nicosia"))
R> dt[ , newTime := format(toTz(B, "UTC", timezone), tz=timezone), by=A ]
R> dt[ , pt := anytime(newTime), by=A ]
R> dt
   A                   B                  timezone             newTime                  pt
1: 1 2016-03-24 22:15:12 America/Indiana/Vincennes 2016-03-24 18:15:12 2016-03-24 18:15:12
2: 2 2016-03-24 22:15:12           Australia/North 2016-03-25 07:45:12 2016-03-25 07:45:12
3: 3 2016-03-24 22:15:12             Pacific/Palau 2016-03-25 07:15:12 2016-03-25 07:15:12
4: 4 2016-03-24 22:15:12      Antarctica/Macquarie 2016-03-25 09:15:12 2016-03-25 09:15:12
5: 5 2016-03-24 22:15:12              Asia/Nicosia 2016-03-25 00:15:12 2016-03-25 00:15:12
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • This is certainly functional, thank you! The caveat on the resulting variable `newTime` being character is my primary concern -- the desire to maintain a date/time format is because the output from this table will be several visualizations where time is central, ie, I need to be able to plot graphs from this table where the x-axis will be `newTime` .. is there any way to take the resulting variable and turn it into `POSIXct` with the timezone from `dt$timezone`? I have tried several options from within `lubridate` and from what I've seen on `anytime` but cannot accomplish this last piece – daRknight Mar 28 '17 at 15:23
  • You need think this through. In a way, you _cannot_ as this is still all the same UTC time point, and that is what `POSIXct` stores. But you could just run `anytime()` over the last column and create a newly parsed one. I will edit my answer accordingly. – Dirk Eddelbuettel Mar 28 '17 at 15:30
  • I believe I follow, but will probably need to read more about `POSIXct` to cover the gaps I have in my comprehension -- I think there are workarounds for the lack of timezone specification, as your solution still gives me the correct time of day from the `newTime` creation for plotting; I appreciate it and if you have any suggested literature for me I'd be happy to receive it :) – daRknight Mar 28 '17 at 15:39