0

I am in the process of merging two data frames based on date-time, and seem to have run into a snag. The time column in 1 of 2 of the DF's has a timezone stamp:

   #Example
   "2012-09-28 08:15:00 MDT"

And the other DF time column does not

   #Example 2
   "2012-09-28 08:15:00"

In my program both of these are POSIXct objects, formatted exactly the same ,besides the timezone stamp. When trying to merge based on the Time columns, NA's appear, b/c they are not recognizing each other.

I have narrowed the problem down to the DF missing the Tz. Something strange is going on. When I have the Data for the datetime Column outside the data frame it reads as such

#Code used to make these values
NewTime<-as.POSIXct(TimeDis$datetime, format="%Y-%m-%d %H:%M")

>NewTime
[1] "2017-08-16 00:00:00 MDT" "2017-08-16 00:15:00 MDT"
[3] "2017-08-16 00:30:00 MDT" "2017-08-16 00:45:00 MDT"

Now when I put this into a data frame with data, the "MDT" does not show up

Discharge_Time<-data.frame(NewTime,DischargeFin)
> Discharge_Time
               NewTime DischargeFin
1  2017-08-16 00:00:00     990525.2
2  2017-08-16 00:15:00     990525.2
3  2017-08-16 00:30:00    1000719.2
4  2017-08-16 00:45:00    1000719.2

Even stranger if I call,

>Discharge_Time[1,1]
"2017-08-16 MDT"

I get the MDT back but now no time....

I have no idea what is going on, but am hoping to find a way for the MDT and all the rest to stick around in that data frame so I can successfully merge it with the other DF, which isn't missing anything

Research Done: How to change a time zone in a data frame?

Changing time zones with POSIXct time series, R

Chabo
  • 2,842
  • 3
  • 17
  • 32
  • Looking at `attributes(x)` for Time1 and Time2, I see that neither has a tz; and that your code runs fine with no NAs visible in DATA.... – Frank Aug 16 '17 at 17:02
  • Ah, totally missed that when I added the POSIXct line, it added a MDT to both data.frames. I will delete that as it is not accurate – Chabo Aug 16 '17 at 17:06
  • Ok. Feel free to reorganize your question in the way that makes the most sense for the question to be understood; it's not necessary to add trailing EDITs and such. Folks can see the edit history and you can apologize to earlier answerers for changing it on them in comments if appropriate. – Frank Aug 16 '17 at 17:09
  • 1
    Roger that, thanks – Chabo Aug 16 '17 at 17:12
  • @Frank I have done more digging and narrowed down the problem and hopefully made my question more clear. I apologize for any confusion. – Chabo Aug 16 '17 at 18:47

2 Answers2

1

Try

library(lubridate)
Df1<-data.frame(Time1 = as_datetime(Time1),Data1)
Df2<-data.frame(Time1 = as_datetime(Time2),Data2)

DATA<-merge(x = Df1, y = Df2, by = "Time1", all.x = TRUE)
DATA
                Time1 Data1 Data2
1 2012-09-28 06:15:00     1     5
2 2012-09-28 06:30:00     2     6
3 2012-09-28 06:45:00     3     7

In your version, your time get's converted to factors, which are not equal across dataframes. For instance,

str(Df1$Time1)
Factor w/ 3 levels "2012-09-28 08:15:00 MDT",..: 1 2 3
str(Df2$Time1)
Factor w/ 3 levels "2012-09-28 08:15:00",..: 1 2 3

shows you that your factor levels are different, which is why the merge does not work.

coffeinjunky
  • 11,254
  • 39
  • 57
  • Thanks for the response. Yes I realize that they are factors in the example, but in the actual program the date-time is a POSIXct object, so they are formatted mainly the same way as your example, just including %H:%M as well. I would like to convert them just using as.Date(), but the objects must be padded first to check for missing data. So my only option is to keep them in POSIXct, although that should work well if I can get rid of the timestamp – Chabo Aug 16 '17 at 16:44
  • 1
    This answer is correct given the code in the example - if the actual program is different, we'd need to see it (or a relevant excerpt) – gcbenison Aug 16 '17 at 16:55
  • I have narrowed the question down, and made it more clear what the problem is, sorry about that. – Chabo Aug 16 '17 at 18:42
0

So after many attempts to recreate this error I found it to a culprit of the na.locf function of the package zoo. After padding my data on the interval '15 min' with the pad function from padr, I wanted to replace those N/A values with the previous value in the column. This works well except for the fact it gets rid of the TZ in the date-time. And this is where the problem came from. An example is shown below

library(padr)
library(zoo)

#Dates Missing 8:30 for padding
Dates<-c("2017-08-18 08:00","2017-08-18 08:15","2017-08-18 08:45",
"2017-08-18 09:00")

#Example Data
Data<-c(1,2,3,4)

#Df
Df<-data.frame(Dates, Data)

#Change to POSIXct
Df$Dates<-as.POSIXct(Df$Dates, format="%Y-%m-%d %H:%M")

#We can see now the Dates have been assigned a Timezone
>Dates
[1] "2017-08-18 08:00:00 MDT" "2017-08-18 08:15:00 MDT"
[3] "2017-08-18 08:45:00 MDT" "2017-08-18 09:00:00 MDT"

#Now we Pad
Df<-pad(Df, interval='15 min')

#TZ is still intact (So it's not padr)
>Df[1,1]
[1] "2017-08-18 08:00:00 MDT"

#Here is where the problem lies, in the na.locf function from zoo
library(zoo)
FixDf<-na.locf(Df, option="locf") #replaces N/A with previous value

FixDf[1,1]
[1] "2017-08-18 08:00:00"  #NO TIMEZONE!
Chabo
  • 2,842
  • 3
  • 17
  • 32
  • It is not supposed to that. Could you make a reproducible example and share it as an issue [here](https://github.com/EdwinTh/padr/issues)? You might have found a bug in `pad`. – Edwin Aug 18 '17 at 07:30
  • @Edwin I have truly found the issue, it lies in the na.locf function from the package zoo, I have placed the example in my answer – Chabo Aug 18 '17 at 15:40