I came up with a strategy that allows to use json module to access information without opening the file:
import bz2
import json
with bz2.open(filename, "rt") as bzinput:
lines = []
for i, line in enumerate(bzinput):
if i == 10: break
tweets = json.loads(line)
lines.append(tweets)
In this way lines
will be a list of dictionaries that you can easly manipulate and, for example, reduce their size by removing keys that you don't need.
Note also that (obviously) the condition i==10
can be arbitrarly changed to fit anyone(?) needings. For example, you may parse some line at a time, analyze them and writing on a txt
file the indices of the lines you really want from the original file. Than it will be sufficient to read only those lines (using a similar condition in i
in the for
loop).