1

I am getting hairless about this topic since some hours now...

As inputs, I have timestamps stored as strings in a list:

0     Sat Mar 30 2019 21:00:00 GMT+0100
1     Sat Mar 30 2019 22:00:00 GMT+0100
2     Sat Mar 30 2019 23:00:00 GMT+0100

When I convert them into datetime object, timezone is reversed:

GC['date'] = pd.to_datetime(my_timestamps)
0     2019-03-30 21:00:00-01:00
1     2019-03-30 22:00:00-01:00
2     2019-03-30 23:00:00-01:00

I have double checked what GMT+1 means on Google.

Considering a UTC time 9 o'clock, GMT+1 time is then 10 o'clock: it is 1 hour more than UTC.

So to get UTC, you should offset GMT+1 time by -1 hour.

I was thus ok with having -1 in the offset then:

0     2019-03-30 21:00:00-01:00

And now, when I try to 'resolve' the offset, I get:

GC['date2'] = pd.to_datetime(my_timestamps, utc=True)
0    2019-03-30 22:00:00+00:00
1    2019-03-30 23:00:00+00:00
2    2019-03-31 00:00:00+00:00

OMG, the offset is reversed. Ok, let's double check with Google again:

https://pythontic.com/datetime/datetime/utcoffset

Output of the code is a +8 hours offset for Singapore vs UTC

Singapore Time instance:2017-02-14 12:15:01.000099+08:00
UTC Offset for Singapore Time:8:00:00

And from Singapore time, to find UTC time, you have to substract these 8 hours:

Singapore time: 5 hour AM
UTC time: 21 hour PM

So the trouble appears to be during the conversion step GMT+1 -> -01:00 in to_datetime.

Please, does anyone has an idea on how solving that? I thank you in advance for your help.

Bests,

Pierre

pierre_j
  • 895
  • 2
  • 11
  • 26
  • Are you sure that's not right, a timestamp object i get is Timestamp('2019-03-30 21:00:00-0100', tz='tzoffset(None, -3600)'), datetime.datetime(2019, 3, 30, 21, 0, tzinfo=tzoffset(None, -3600)) – oppressionslayer Dec 27 '19 at 22:16
  • Hello, I submitted that to pandas Github bugtracker. I think GMT+1h should stay UTC+1h when rewritten in iso format. I saw the trouble when setting as indexes of my dataframe these timestamps, and activating verify_integrity: because of this *trouble* (I don't dare calling that a bug yet) this check identified duplicate indexes. I will report any feedback from pandas team – pierre_j Dec 27 '19 at 22:37

1 Answers1

2

This is not a full answer, but I have tracked down where the bug lies. The pandas is calling the dateutils parse function to do the actual parsing and that function has the bug.

>>from dateutil.parser import parse  
>>parse('Sat Mar 30 2019 21:00:00 GMT+0100')
datetime.datetime(2019, 3, 30, 21, 0, tzinfo=tzoffset(None, -3600))
>>parse('Sat Mar 30 2019 21:00:00+0100')
datetime.datetime(2019, 3, 30, 21, 0, tzinfo=tzoffset(None, 3600))

Notice the -3600 in the first output, while it should have been +3600 like in the second output.

But from the dateutil's perspective, this looks like an "expected" behavior when timestamp is specified as GMT+int. From the dateutil parase function

  # Check for something like GMT+3, or BRST+3. Notice
  # that it doesn't mean "I am 3 hours after GMT", but
  # "my time +3 is GMT". If found, we reverse the
  # logic so that timezone parsing code will get it
  # right.

So, they are interpreting GMT+1 as "my time plus 1 hour would be GMT" which feels like a very non-standard interpretation to me.

rajendra
  • 472
  • 3
  • 18
  • Hello Rajendra, yes, I posted a but in pandas Github and they re-oriented me to dateutil. Please, what would be your recommendation as a workaround? I have spotted this stack overflow question that proposes one. I am about to try it. https://stackoverflow.com/questions/31078749/timezone-offset-sign-reversed-by-python-dateutil – pierre_j Dec 28 '19 at 07:05
  • @pierre_j Yeah, It looks like a standards issue. I would recommend just doing find and replace in your original source files to remove the 'GMT' altogether. It will then be parsed correctly. – rajendra Dec 28 '19 at 18:00