I am parsing ~250K XMLs and loading the data into a SQLite database. I am using node version 10.15.1 with cheerio
and better-sqlite3
on a Mac OS X laptop with 8GB memory. I am readdirSync
-ing the entire folder of ~250K files, and parsing the XML files and loading the extracted data using transactions in batches of 10K. I am using --max_old_space_size=4096
but still getting the FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory.
Now, if I process 100K files, then quit node, then start again and process the remaining ~150K files, then it all works. But, I'd rather do it all in one go as this is something that has to be done unattended. Is there anything else I can do given my constraints? I can't use a machine with more memory because I don't have access to one. I could try bumping up the --max_old_space_size
a bit more, or I could try doing smaller batches of transactions, but am not sure if that will help (I tried with 8000 files per transaction instead of 10K, but that too ran out of memory). The only thing right now that seems to help is quitting node in between. Is there anyway I can simulate that? That is, tell node to release all the memory and pretend it has been restarted? Any other thoughts?