python handling large json response from wikipedia api

Question

I am accessing the wikipedia api to grab the text from a page. I'm using the parse api call with the page name. Click here for example, then hit the make request button to get the response. It gives you the html of the whole site as an element in the json object and allows you to parse the items you need by providing a byte offset for each section in the wiki page. Is there a better way to handle this then to load the whole response into memory? Right now, all I can think of is to use json.loads() to create a dict and then split the string by each of the byte offset it specifies.

score 1 · Answer 1 · answered Jan 14 '16 at 16:53

You could use iijson, an iterative JSON parser. This package allows you to iterate over nodes, for example:

import ijson

f = urlopen('http://.../')
objects = ijson.items(f, 'earth.europe.item')
cities = (o for o in objects if o['type'] == 'city')
for city in cities:
   do_something_with(city)

python handling large json response from wikipedia api

1 Answers1