I am parsing an extremely large JSON file using IJSON and then writing the contents to a temp file. Afterwards, I overwrite the original file with the contents of the temp file.
FILE_NAME = 'file-name'
DIR_PATH = 'path'
#Generator function that yields dictionary objects.
def constructDictionary():
data = open(os.path.join(DIR_PATH, FILE_NAME + ".json"), "rb")
row = ijson.items(data,'item')
for record in row:
yield record
data.close()
def writeToTemp(row, temp):
#Needs to add a comma
json.dump(row, temp)
def writeTempToFile(temp):
temp.seek(0)
data = open(os.path.join(DIR_PATH, FILE_NAME + ".json"), "wb")
data.write(b'[')
for line in temp:
data.write(line.encode('utf-8'))
data.write(b']')
data.close()
if __name__ == "__main__":
temp = tempfile.NamedTemporaryFile(mode = 'r+')
for row in constructDictionary():
writeToTemp(row,temp)
writeTempToFile(temp)
temp.close()
My issue is that I end up with the JSON objects being written without commas between them. I can't parse over the file again and add the missing commas as it would take way too long. Ideally, while writing i would be able to add a comma at the end of each json.dump(). But, how would i handle the final entry?
Some way to determine when the generator function has reached the end of the file? Then i would use a flag or pass a variable so that it wouldn't write the final comma.
Or, i could use file.seek() to go to the character before the final character and remove it. But that sounds not good.
I would appreciate any suggestions, thank you.