I am trying to read json files of 30gb and I can do it by using ijson, but to speeed up the process I am trying to use multiprocessing. but I am unable to make it work, I can see the n workers ready but only one worker is taking all the load of the work.
Does anyone if it is possible to run multiprocessing + ijson
here is a sample of the code:
import ijson
import pandas as pd
import multiprocessing
file='jsonfile'
player=[]
def games(record):
games01 = record["games"]
for game01 in games01:
try:
player.append(game01['player'])
except KeyError:
player.append('No_record_found')
if __name__=='__main__':
with open(file, "rb") as f:
pool = multiprocessing.Pool()
pool.map(games, ijson.items(f, "game.item"))
pool.close()
pool.join()