1

I am trying to use lubridate::force_tz to add timezone information to timestamps (date+time) formatted as strings (as.character()). Both are stored as two columns in a data frame:

require(lubridate)
require(dplyr)
row1<-c(as.character(now()),"Etc/UTC")
row2<-c(as.character(now()+5),"America/Chicago")
df<-as.data.frame(rbind(row1,row2))
names(df)<-c("dt","tz")

x<-force_tz(as.POSIXct(as.character(now())),"Etc/UTC") #works
df<-df%>%mutate(newDT=force_tz(as.POSIXct(dt),tz)) #fails

I get: Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "c('matrix', 'character')"

Following Stibu's comments, I tried (an un-R like) approach with an iteration:

for (i in seq(from=1,to=length(df$dt))){
timestamp<-as.character(df[i,1])
tz<-as.character(df[i,2])
print(tz)
newdt<-force_tz(as.POSIXct(timestamp),tz)
df[i,3]<-newdt
print(attr(df[i,3],"tzone"))
df$timezone<-attr(df[i,3],"tzone")
}

This extracts the values correctly, but seems to get stuck with setting the value of the tz to the first value encountered - weirdly:

[1] "Etc/UTC"
[1] "Etc/UTC"
[1] "America/Chicago"
[1] "Etc/UTC"

I would have expected the last printout to result in "America/Chicago" The df then looks like:

 > df
               dt              tz               newDT timezone
 1 2016-04-13 23:07:45         Etc/UTC 2016-04-13 23:07:45  Etc/UTC
 2 2016-04-13 23:07:50 America/Chicago 2016-04-14 04:07:50  Etc/UTC
MartinT
  • 1,671
  • 3
  • 14
  • 14
  • Stibu below provided a nice explanation, highlighting two issues - one is with my replicable example (I will edit the question once I get time), the second is the real culprit ( I have diagnosed the issue as the same but could not believe that I cannot pass an element from a vector to tz=.... Others, please look at that part of the answer of Stibu if you can help me... – MartinT Apr 13 '16 at 20:47
  • I have found a related post here, but I was not able to make it run: http://stackoverflow.com/questions/33848563/with-tz-with-a-vector-of-timezones/33848764#33848764 – MartinT Apr 14 '16 at 00:55
  • And there is a thread with other people discussin on lubridate github: http://stackoverflow.com/questions/36598115/use-dplyrmutate-and-lubridateforce-tz-based-on-arguments-from-data-frame-col – MartinT Apr 14 '16 at 01:02

1 Answers1

2

You have actually two issues in your code that I will discuss separately below.

dplyr works with data frames

Your df is a matrix, not a data frame. But mutate() (and functions from dplyr in general) works with data frames. The error message simply tells you that mutate() does not know what to do with a matrix.

You can solve this by converting df to a data frame:

df <- as.data.frame(df)
names(df)<-c("dt","tz")

A remark regarding names(): This function can be used to get/set the column names of a data frame. For matrices, the corresponding function is colnames(). You used names() on a matrix, which did not set the column names of the matrix. Therefore, the names of the data frame are also not set after conversion.

You could also create a data frame from the start as follows:

df <- data.frame(dt = as.character(c(now(), now() + 5)),
                 tz = c("Etc/UTC", "America/Chicago"),
                 stringsAsFactors = FALSE)

Note that you need to define the contents column-wise, not row-wise as you did.

If you use the data frame df, there will be no error from mutate().

One time zone per vector

Unfortunately, there is a second issue. What you want to do simply cannot be done. The reason is the following.

Let's convert the first column of df to POSIXct with time zone CET:

ts <- as.POSIXct(df$dt, tz = "CET")
ts
## [1] "2016-04-13 14:42:26 CEST" "2016-04-13 14:42:31 CEST"

Let's try to do the same with two time zones:

ts <- as.POSIXct(df$dt, tz = c("CET", "UTC"))
## Error in strptime(xx, f <- "%Y-%m-%d %H:%M:%OS", tz = tz) : 
##   invalid 'tz' value

This does not work. The reason is that there is a single time zone per vector and not a time zone per element in the vector. Look at the attributes of ts:

attributes(ts)
## $class
## [1] "POSIXct" "POSIXt" 
## 
## $tzone
## [1] "CET"

The time zone is set as an attribute of the entire vector and it is not a property of each element.

Stibu
  • 15,166
  • 6
  • 57
  • 71
  • Thank you Stibu for a nice and extensive explanation. I appreciate the comments about the df (and meas culpa, this happened while producing a replicable example). My real dataset is a df properly. SO the second part of the answer is the key. Can you think of an approach that would help here? I thought that maybe a traditional loop, but it felt very un-R... I would accept also any other answer that allows to use a "lookup" df/table to set tz to numerous records. Otherwise I resort to a db I guess... – MartinT Apr 13 '16 at 20:45
  • Whether you use a loop or not, you will not be able to store all the time stamps in a single vector. You could produce a list with lapply. But I think that the right approach for you depends on what you want to do in the end. – Stibu Apr 13 '16 at 20:47
  • HI Stibu, not sure I understand how a list would be different to a vector. What I want is to have a column in a df with all timestamps pegged to their respective timezones, so that I can do arithmetics with timestamps reliably ( difference of two instants in different timezones resulting in proper interval lengths). – MartinT Apr 13 '16 at 23:07
  • I have edited the question to reflect our discussion, thanks! – MartinT Apr 13 '16 at 23:29
  • 1
    Regarding your edit: nothing is weird there. As I tried to explain, a POSIXct vector has a single time zone. When you store the first time stamp in `df$newDT`, the time zone is set to `"Etc/UTC"`. And it does not change when you store the next element in `df$newDT`. Rather that time stamp is converted to `"Etc/UTC"`. If you are interested in the time differences, this is exactly what you need. The list would help, because each list element could be a POSIXct-vector of length one with its own time zone. – Stibu Apr 14 '16 at 06:26
  • Thanks Stibu. The situation is that I have 5 columns with timestamps. Each timestamp may be in a different timezone. I am computing differences within rows, across columns. Each row is for tracking one item... I will have to take the timestamps without timezones, and the timezone information, and somehow imprint the timezone on the timestamp, and convert to UTC> then I will have UTC timestamps across the DF, which would be fine ( and I think it would also help with the daylight savings problems). – MartinT Apr 15 '16 at 06:56