2

Not new to R or formatting dates in R and wouldn't be asking this question but I am having seriously strange behavior and in the last 2 hours am no closer to resolving it.

I have a dataset which I have imported and want to format the date/time column using as.POSIXct. The date is a non-standard format and I've applied what I know to be the proper formatting. Here is a small part of the data that I am having trouble with. Code just after. Problem is that there are 4 NA's starting at "2015-03-08 02:00:00 PST". What gives? This seems completely random as it happens no where else in any of the other 55K observations.

bad.Dates<-c("3/7/2015 14:15", "3/7/2015 14:30", "3/7/2015 14:45", "3/7/2015 15:00", 
         "3/7/2015 15:15", "3/7/2015 15:30", "3/7/2015 15:45", "3/7/2015 16:00", 
         "3/7/2015 16:15", "3/7/2015 16:30", "3/7/2015 16:45", "3/7/2015 17:00", 
         "3/7/2015 17:15", "3/7/2015 17:30", "3/7/2015 17:45", "3/7/2015 18:00", 
         "3/7/2015 18:15", "3/7/2015 18:30", "3/7/2015 18:45", "3/7/2015 19:00", 
         "3/7/2015 19:15", "3/7/2015 19:30", "3/7/2015 19:45", "3/7/2015 20:00", 
         "3/7/2015 20:15", "3/7/2015 20:30", "3/7/2015 20:45", "3/7/2015 21:00", 
         "3/7/2015 21:15", "3/7/2015 21:30", "3/7/2015 21:45", "3/7/2015 22:00", 
         "3/7/2015 22:15", "3/7/2015 22:30", "3/7/2015 22:45", "3/7/2015 23:00", 
         "3/7/2015 23:15", "3/7/2015 23:30", "3/7/2015 23:45", "3/8/2015 0:00", 
         "3/8/2015 0:15", "3/8/2015 0:30", "3/8/2015 0:45", "3/8/2015 1:00", 
         "3/8/2015 1:15", "3/8/2015 1:30", "3/8/2015 1:45", "3/8/2015 2:00", 
         "3/8/2015 2:15", "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00", 
         "3/8/2015 3:15", "3/8/2015 3:30", "3/8/2015 3:45", "3/8/2015 4:00", 
         "3/8/2015 4:15", "3/8/2015 4:30", "3/8/2015 4:45", "3/8/2015 5:00", 
         "3/8/2015 5:15", "3/8/2015 5:30", "3/8/2015 5:45", "3/8/2015 6:00", 
         "3/8/2015 6:15", "3/8/2015 6:30", "3/8/2015 6:45", "3/8/2015 7:00", 
         "3/8/2015 7:15", "3/8/2015 7:30", "3/8/2015 7:45", "3/8/2015 8:00", 
         "3/8/2015 8:15", "3/8/2015 8:30", "3/8/2015 8:45", "3/8/2015 9:00", 
         "3/8/2015 9:15", "3/8/2015 9:30", "3/8/2015 9:45", "3/8/2015 10:00", 
         "3/8/2015 10:15", "3/8/2015 10:30", "3/8/2015 10:45", "3/8/2015 11:00", 
         "3/8/2015 11:15", "3/8/2015 11:30", "3/8/2015 11:45", "3/8/2015 12:00", 
         "3/8/2015 12:15", "3/8/2015 12:30", "3/8/2015 12:45", "3/8/2015 13:00", 
         "3/8/2015 13:15", "3/8/2015 13:30", "3/8/2015 13:45", "3/8/2015 14:00", 
         "3/8/2015 14:15", "3/8/2015 14:30", "3/8/2015 14:45", "3/8/2015 15:00", 
         "3/8/2015 15:15") 

as.POSIXct(strptime(bad.Dates,"%m/%d/%Y %H:%M"))
thelatemail
  • 91,185
  • 12
  • 128
  • 188
CCurtis
  • 1,770
  • 3
  • 15
  • 25
  • 4
    That's the exact time that daylight savings time (PDT) began. So that hour is essentially missing in time. Spooky! – Rich Scriven Oct 13 '15 at 00:06
  • Nice catch, @RichardScriven. Curtis, stop fighting time :) – Shawn Mehan Oct 13 '15 at 00:09
  • Wow! Mine blown. Lol Supposed I should have noticed as it says `"2015-03-08 01:45:00 PST"` before and `"2015-03-08 03:00:00 PDT"` after the NA's. – CCurtis Oct 13 '15 at 00:11
  • 1
    Never mess with time-stamps after lunch. – CCurtis Oct 13 '15 at 00:12
  • 1
    It's generally good form to use "UTC" or "GMT" as a timezone when importing time/date data unless you explicitly want the timezone and daylight savings to be considered. – thelatemail Oct 13 '15 at 00:16
  • 1
    BTW. Solved my issue by specifying `tz` in as.POSIXct()`. Now it reports everything in the same timezone without NA's. Thank you for your help. @RichardScriven. – CCurtis Oct 13 '15 at 00:16

1 Answers1

3

To make this example reproducible/solvable regardless of location, specify the timezones via tz= explicitly:

bad.Dates <- c("3/8/2015 1:45", "3/8/2015 2:00", "3/8/2015 2:15",
               "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00")
as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="US/Pacific")

#[1] "2015-03-08 01:45:00 PST"
#[2] NA                       
#[3] NA                       
#[4] NA                       
#[5] NA                       
#[6] "2015-03-08 03:00:00 PDT"

You get NAs because those times don't exist in the modern-day timekeeping of the US Pacific region.

Most of the United States, Canada, and Mexico's northern border cities will begin Daylight Saving Time (DST) on Sunday, March 8, 2015. People in areas that observe DST will spring forward one hour from 2am (02:00) to 3am (03:00), local time.
Source: http://www.timeanddate.com/news/time/usa-canada-start-dst-2015.html

Specifying a timezone like "UTC" that doesn't observe daylight savings will get around this issue.

as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="UTC")
#[1] "2015-03-08 01:45:00 UTC"
#[2] "2015-03-08 02:00:00 UTC"
#[3] "2015-03-08 02:15:00 UTC"
#[4] "2015-03-08 02:30:00 UTC"
#[5] "2015-03-08 02:45:00 UTC"
#[6] "2015-03-08 03:00:00 UTC"
thelatemail
  • 91,185
  • 12
  • 128
  • 188