If I run the following git log command (here, in this repo: https://github.com/rubyaustralia/rubyconfau-2013-cfp):
$ git --no-pager log --reverse --date=raw --pretty='%ad %h'
1344507869 -0700 314b3d4
1344508222 +1000 dffde53
1344510528 +1000 17e7d3b
...
... I get a list, where I have both Unix timestamp (seconds since Epoch), and a UTC offset, for every commit. What I would like to do, is to obtain a timezone aware datetime, that will:
- Show me the date/time as the commit author saw it at the time (as per the recorded UTC time)
- Show me the date/time as I would have seen it in my local timezone
In the first case, all I have is a UTC offset, not the author's time zone - and as such I'd have no information about possible daylight savings changes.
In the second case, my OS would most likely be set up to a certain locale including a (geographical) timezone, which would be aware of DST changes; say CET timezone has UTC offset of +0100 in winter, but in the summer daylight saving, it has UTC offset of +0200 (and is then called CEST)
In any case, I'd want to start with a UTC timestamp, because the "1344508222" count of epoch seconds is independent from timezones; the offset +1000 would simply help us see the human-readable output hopefully as the author saw it.
I need to do this for a Python 2.7 project, and I scoured through a ton of resources (SO posts), - and I came up with the following example (which attempts to parse the second line from the above snippet, "1344508222 +1000 dffde53
"). However, I'm really not sure if it is right; so ultimately, my question would be - what would be the right way to do this?
Preamble:
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import datetime
import pytz
import dateutil.tz
import time
def getUtcOffsetFromString(in_offset_str): # SO:1101508
offset = int(in_offset_str[-4:-2])*60 + int(in_offset_str[-2:])
if in_offset_str[0] == "-":
offset = -offset
return offset
class FixedOffset(datetime.tzinfo): # SO:1101508
"""Fixed offset in minutes: `time = utc_time + utc_offset`."""
def __init__(self, offset):
self.__offset = datetime.timedelta(minutes=offset)
hours, minutes = divmod(offset, 60)
#NOTE: the last part is to remind about deprecated POSIX GMT+h timezones
# that have the opposite sign in the name;
# the corresponding numeric value is not used e.g., no minutes
self.__name = '<%+03d%02d>%+d' % (hours, minutes, -hours)
def utcoffset(self, dt=None):
return self.__offset
def tzname(self, dt=None):
return self.__name
def dst(self, dt=None):
return datetime.timedelta(0)
def __repr__(self):
return 'FixedOffset(%d)' % (self.utcoffset().total_seconds() / 60)
Start of parsing:
tstr = "1344508222 +1000 dffde53"
tstra = tstr.split(" ")
unixepochsecs = int(tstra[0])
utcoffsetstr = tstra[1]
print(unixepochsecs, utcoffsetstr) # (1344508222, '+1000')
Get UTC timestamp - first I attempted to parse the string 1528917616 +0000
with dateutil.parser.parse
:
justthetstz = " ".join(tstra[:2])
print(justthetstz) # '1344508222 +1000'
#print(dateutil.parser.parse(justthets)) # ValueError: Unknown string format
... but that unfortunately fails.
This worked to get UTC timestamp:
# SO:12978391: "datetime.fromtimestamp(self.epoch) returns localtime that shouldn't be used with an arbitrary timezone.localize(); you need utcfromtimestamp() to get datetime in UTC and then convert it to a desired timezone"
dtstamp = datetime.datetime.utcfromtimestamp(unixepochsecs).replace(tzinfo=pytz.utc)
print(dtstamp) # 2012-08-09 10:30:22+00:00
print(dtstamp.isoformat()) # 2012-08-09T10:30:22+00:00 # ISO 8601
Ok, so far so good - this UTC timestamp looks reasonable.
Now, trying to get the date in author's UTC offset - apparently a custom class is needed here:
utcoffset = getUtcOffsetFromString(utcoffsetstr)
fixedtz = FixedOffset(utcoffset)
print(utcoffset, fixedtz) # (600, FixedOffset(600))
dtstampftz = dtstamp.astimezone(fixedtz)
print(dtstampftz) # 2012-08-09 20:30:22+10:00
print(dtstampftz.isoformat()) # 2012-08-09T20:30:22+10:00
This looks reasonable too, 10:30 in UTC would be 20:30 in +1000; then again, an offset is an offset, no ambiguity here.
Now I'm trying to derive the datetime in my local timezone - first, it looks like I shouldn't use the .replace
method:
print(time.tzname[0]) # CET
tzlocal = dateutil.tz.tzlocal()
print(tzlocal) # tzlocal()
dtstamplocrep = dtstamp.replace(tzinfo=tzlocal)
print(dtstamp) # 2012-08-09 10:30:22+00:00
print(dtstamplocrep) # 2012-08-09 10:30:22+02:00 # not right!
This doesn't look right, I got the exact same "clock string", and different offsets.
However, .astimezone
seems to work:
dtstamploc = dtstamp.astimezone(dateutil.tz.tzlocal())
print(dtstamp) # 2012-08-09 10:30:22+00:00
print(dtstamploc) # 2012-08-09 12:30:22+02:00 # was August -> summer -> CEST: UTC+2h
I get the same with a named pytz.timezone
:
cphtz = pytz.timezone('Europe/Copenhagen')
dtstamploc = dtstamp.astimezone(cphtz)
print(dtstamp) # 2012-08-09 10:30:22+00:00
print(dtstamploc) # 2012-08-09 12:30:22+02:00 # is August -> summer -> CEST: UTC+2h
... however, I cannot use .localize
here, since my input dtstamp
already has a timezone associated with it, and is therefore not "naive" anymore:
# dtstamploc = cphtz.localize(dtstamp, is_dst=True) # ValueError: Not naive datetime (tzinfo is already set)
Ultimately, so far this looks correct, but I'm really uncertain about it - especially since I got to see this:
pytz.astimezone not accounting for daylight savings?
You can't assign the timezone in the datetime constructor, because it doesn't give the timezone object a chance to adjust for daylight savings - the date isn't accessible to it. This causes even more problems for certain parts of the world, where the name and offset of the timezone have changed over the years.
From the pytz documentation:
Unfortunately using the tzinfo argument of the standard datetime constructors ‘’does not work’’ with pytz for many timezones.
Use the localize method with a naive datetime instead.
... which ended up confusing me: say I want to do this, and I already have a correct timezoned timestamp, - how would I derive a "naive" datetime for it? Just get rid of the timezone info? Or is the right "naive" datetime derived from version of the timestamp expressed in UTC (e.g. 2012-08-09 20:30:22+10:00
-> 2012-08-09 10:30:22+00:00
, and so the right "naive" datetime would be 2012-08-09 10:30:22
)?