I'm loading all geographic entries (Q56061) from wikidata json dump. Whole dump contains about 16M entries according to Wikidata:Statistics page.
Using python3.4 + ijson + libyajl2 it comes to take about 93 hours of CPU (AMD Phenom II X4 945 3GHz) time just to parse the file. Using online sequential item queries for total of 2.3M entries of interest comes to take about 134 hours.
Is there some more optimal way to perform this task? (maybe, something like openstreetmap pdf format and osmosis tool)