1

I have date-time pairs in a csv file that look like

11/4/2012

in one column and

12:06:08 AM

in the neighboring column. They are recorded in local time (i.e., they switch to PST and PDT at the appropriate times), but there is no tz or DST indicator in the file. The only visible way to detect that is that the sequence of times does funny things. For example, on November 4, 2012, I have a sequence of times like

12:51:20 AM 1:13:08 AM 1:24:58 AM 1:40:28 AM 1:48:08 AM 1:54:08 AM 1:56:58 AM 1:04:28 AM 1:05:48 AM 1:07:18 AM 1:15:00 AM 1:39:08 AM 2:05:38 AM

PST presumably begins with the 1:04:28 AM reading, but there is no indicator.

Is there a straightforward approach to assigning time zones properly (presumably using lubridate)? The file is long, so I'd rather not loop through one reading at a time, as I fear that could take some time. I'll have to do the same thing in reverse for the spring.

Bill
  • 533
  • 1
  • 4
  • 16

1 Answers1

1

This isn't possible. There's no way to know with certainty that "11/4/2012 1:04:28 AM" is PST and not actually an observation between "11/4/2012 12:51:20 AM" and "11/4/2012 1:13:08 AM" PDT.

If you're certain the observations are ordered in the file, you could convert them to POSIXt and take the diff of the vector. Any negative values will be DST changes. You may miss some, however, if the time between observations across a DST change is greater than 1 hour.

Lines <- "11/4/2012 12:51:20 AM
11/4/2012 01:13:08 AM
11/4/2012 01:24:58 AM
11/4/2012 01:40:28 AM
11/4/2012 01:48:08 AM
11/4/2012 01:54:08 AM
11/4/2012 01:56:58 AM
11/4/2012 01:04:28 AM
11/4/2012 01:05:48 AM
11/4/2012 01:07:18 AM
11/4/2012 01:15:00 AM
11/4/2012 01:39:08 AM
11/4/2012 02:05:38 AM"

x <- scan(con <- textConnection(Lines), what="", sep="\n")
close(con)
diff(strptime(x, format="%m/%d/%Y %I:%M:%S %p"))
# Time differences in mins
#  [1]  21.800000  11.833333  15.500000   7.666667   6.000000   2.833333
#  [7] -52.500000   1.333333   1.500000   7.700000  24.133333  86.500000
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • But if the only options at PDT or PST, you know the date of daylight savings, then you could pick between based on that. There's no easy way to get that date from R though. – hadley Feb 22 '13 at 13:48
  • @hadley: you could generate an hourly sequence from the first to last observation, convert it to `POSIXlt` and check when the `isdst` element switches from 0/1. – Joshua Ulrich Feb 22 '13 at 14:08
  • True, but it's a bit awkward, given that the original data is stored as transition points. – hadley Feb 22 '13 at 14:39
  • Thanks for the replies. I realized the same thing last night; while the data is ordered, there's no guarantee that the diff will show a negative value in the general case. I'm about to drop all data from the 1 o'clock hour on the "fall back" day and interpolate across the gap. It looks like the "spring forward" day lacks that ambiguity. – Bill Feb 22 '13 at 21:21