I am accessing the wikipedia api to grab the text from a page. I'm using the parse api call with the page name. Click here for example, then hit the make request button to get the response. It gives you the html of the whole site as an element in the json object and allows you to parse the items you need by providing a byte offset for each section in the wiki page. Is there a better way to handle this then to load the whole response into memory? Right now, all I can think of is to use json.loads() to create a dict and then split the string by each of the byte offset it specifies.
Asked
Active
Viewed 319 times
1 Answers
1
You could use iijson, an iterative JSON parser. This package allows you to iterate over nodes, for example:
import ijson
f = urlopen('http://.../')
objects = ijson.items(f, 'earth.europe.item')
cities = (o for o in objects if o['type'] == 'city')
for city in cities:
do_something_with(city)

Alex
- 21,273
- 10
- 61
- 73