0

Hi I am trying to import a huge data. its around 8GB having more then 53k files. What i want is to import one file at a time, done with its usage and then delete/remove or garbage collect the cache.

I know when we use delete keyword it only delete the reference of obj instance. Delete or making variable value to null or undefined doesn't work.

Here is my code.

fs.readdir(dirname, (err, filenames) => {
    if (err) {
        onError(err);
        return;
    }
    _.forEach(filenames, (file) => {
        if (!!~file.indexOf('.json')) {
            this.synchronize(() => {
                let currentFile = require(`${dirname}${file}`);
                return new Promise((resolve, reject) => {
                    setTimeout(() => {
                        //assume am done working with the data imported in variable currentFile here. now i want to delete it.
                        resolve('Done');
                    }, 1)
                })
            })
        }

    });
});

I tried every possible way to make the cache empty but not succeeded. Is there any way to clear the currentFile after am done working on it?

Or may be how to tweak my code to achieve the functionality that it will work for any number of files in the folder.

any help will be appreciated

Faheem Alam
  • 515
  • 4
  • 20
  • Try using `JSON.parse(fs.readFile(…))` instead of `require`, which uses the global module cache. Or clear that cache - see the duplicate questions. – Bergi Aug 24 '17 at 07:52

2 Answers2

1

I see 2 big problems with your code:

  1. require is able to load a json file that's ok but the main issue is that it is doing it synchronously so you can't properly use the node.js async nature, this is why is not recommended to use require for dynamic files, require is meant for app initialisation cause internaly it uses caching mechanism useless in your case and which slows down your app.
  2. loading and processing all files in this manner is very resource consuming and you should refactor your architecture using the stream api this is the use case where streams can shine.

A way to solve your problem is to dive into the following to resources:

  1. JSONStream npm module
  2. fs.createReadStream from fs module

The idea is to iterate over all the files in the directory which you are already doing and then create a read stream for each file and then pipe JSONStream.parse it in this way you will gain enormous amount of memory.

Alexandru Olaru
  • 6,842
  • 6
  • 27
  • 53
0

You can't explicitly tell the garbage collector to release an object. An object is destroyed as soon as the garbage collector realizes that there are no more references to the object. To achieve that, you have to make sure that all references to the object are removed. As long as there is at least one reference, the object will stay in memory.

The moment you resolve the promise, currentFile isn't referenced anymore, and will eventually be garbage collected.

PeterMader
  • 6,987
  • 1
  • 21
  • 31