2

I have a large ARFF file with data that looks something like this:

555,"2011-03-13 01:50:48.000",0
540,"2011-03-13 02:10:19.000",0

To help parse it, I declared the second attribute like this:

@attribute RecordedOn date "yyyy-MM-dd HH:mm:ss.SSS"

The parser, which uses Java's SimpleDateFormat, works fine for the first line (and the couple million lines that are very similar to it), but chokes on a few lines, like the second one. I've noticed that it only chokes one lines whose hour is "02"--in fact, the second line is parsed fine if I change it to 540,"2011-03-13 01:10:19.000",0. To add to the mystery, some lines with a 02 are parsed fine anyway. Like: 1,"2006-12-16 02:58:51.000",111

So does anyone know what's happening? Any advice? Thanks in advance.

tsm
  • 3,598
  • 2
  • 21
  • 35

2 Answers2

5

You are almost certainly interpreting the dates as local times in a time zone that observes Daylight Saving Time. March 13, 2011 was the start of Daylight Saving Time in the United States; this means the clock advances from 01:59:59 to 03:00:00, skipping the entire 2 o'clock hour. "2011-03-13 02:10:19.000" local time never occurred in, e.g., New York City.

Russell Silva
  • 2,772
  • 3
  • 26
  • 36
  • Wow, good call. You're absolutely right--other problems include March 9, 2008 and March 11, 2007. For this particular dataset, I've just been commenting out the affected lines. But for future ones, I guess I need to investigate how the clocks on the sensors were set. Since I can't change the locale directly, would the best solution would be to write a script to convert everything to the standard time? – tsm May 20 '11 at 19:35
  • Looking at JavaDoc, SimpleDateFormat has a "setTimeZone" method you might be able to set to GMT, thus interpreting the dates in a time zone that does not observe DST. That should fix your problem unless you have sensors in different time zones that you need to compare. – Russell Silva May 21 '11 at 06:24
0

I had the same problem with some data and I could use it by changing my system timezone to America/Phoenix (since they do not use DST). Other solution I found later is to just change the timezone of the JVM when running WEKA ( https://www.baeldung.com/java-jvm-time-zone )