tl;dr : datetime can't handle that kind of things, so don't even try. You have strings, keep them and treat them as such.
You could simply sort them as strings, provided they're of consistent length (otherwise pad as needed) and format. This will allow for sorting of "extended" ISO8601:2004 timestamps (as by standard 00
for months and days is not allowed).
Assuming Python3, this code :
import urllib.request,json
url = urllib.request.urlopen("https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q90&props=info%7Caliases%7Clabels%7Cdescriptions%7Cclaims%7Cdatatype%7Csitelinks%2Furls&languages=fr&languagefallback=1&formatversion=2")
data = json.loads(url.read().decode())
P6 = sorted(data['entities']['Q90']['claims']['P6'], key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'])
for x in P6:
print(x['mainsnak']['datavalue']['value']['numeric-id'])
yields this resultset :
1685301
947901
656015
2596877
3131449
1986521
1685102
1684642
601266
677730
289303
959708
2105
1685859
256294
2851133
Additionally, you'll want to separate your list into two :
- items starting with a
-
sign
- items starting with a
+
sign
Then sort the first list by month-date-time ascending, then by unsigned integer value of the year represented by a string (as sort()
and sorted()
are guaranteed "stable"), plainly sort the second, and concatenate them back again. This will allow for proper sorting of signed ISO8601 timestamps.
neg = [x for x in P6 if x['qualifiers']['P580'][0]['datavalue']['value']['time'].startswith('-') ]
pos = [x for x in P6 if x['qualifiers']['P580'][0]['datavalue']['value']['time'].startswith('+') ]
neg.sort(key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'][5:])
neg.sort(key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'][1:5])
pos.sort(key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'])
P6sorted = neg+pos
As for the padding, should it be needed, it's trivial enough using string.rjust()
(although you'll have to somewhat alter the sorting to reflect the "new" timestamps' length ; string.zfill()
is not the right tool for that job, as the string you're altering isn't numeric, having 'T', 'Z', '-', and ':') :
maxlength = max( map( lambda claim: len( claim['qualifiers']['P580'][0]['datavalue']['value']['time'] ), P6 ) )
for claim in P6:
claim['qualifiers']['P580'][0]['datavalue']['value']['time'] = claim['qualifiers']['P580'][0]['datavalue']['value']['time'][0] + claim['qualifiers']['P580'][0]['datavalue']['value']['time'][1:].rjust(maxlength-1, "0");
neg = [x for x in P6 if x['qualifiers']['P580'][0]['datavalue']['value']['time'].startswith('-') ]
pos = [x for x in P6 if x['qualifiers']['P580'][0]['datavalue']['value']['time'].startswith('+') ]
neg.sort(key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'][maxlength-16:])
neg.sort(key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'][maxlength-22:maxlength-16], reverse=True)
pos.sort(key=lambda claim: claim['qualifiers']['P580'][0]['datavalue']['value']['time'])
P6sorted = neg+pos
for claim in P6sorted:
print([claim['mainsnak']['datavalue']['value']['id'],claim['qualifiers']['P580'][0]['datavalue']['value']['time']])
As an aside, you may want to "Decorate-Sort-Undecorate" (perform a Schwartzian transform), for readability.
Finally, if you're worried about Julian vs Gregorian calendars, you'll have to convert the Julian dates into Gregorian dates based on country and year by adding the corresponding number of days, and apply the above method. But keep in mind a Julian date (YYYY)-(MM)-(DD) predates a Gregorian date "that seems one day ahead", so it really shouldn't be too much of a worry.