I have a piece of code, it process thousands of files in a directory, for each file, it generate an object (dictionary) with part of its key-value as:
{
........
'result': [...a very long list...]
}
if I process all the files, save result in a list then use jsonlines library to write all, my laptop (mac) will run out of memory.
So my solution will be process one by one, and get result, then insert into the jsonline file and delete the object and release memory.
After check the official document: https://jsonlines.readthedocs.io/en/latest/
I couldn't find a method which can write without overwriting the jsonline file.
So how I can handle such big output.
Besides, I'm using parallel threads to process result:
from multiprocessing.dummy import Pool
Pool(4).map(get_result, file_lst)
I do hope to open the json_file, write each result and then release the memory.