I have some issue with a project of mine, which aims to scan one or more directories in search of MP3 files and store its metadata and paths into MongoDB.
The main computer which runs the code is a Windows 10 64-bit machine, with 8GB RAM, CPU AMD Ryzen 3.5 GHz (4 cores). Windows resides on an SSD, while the music on HDD 1 TB.
The nodejs app can be launched manually by command line or through NPM, starting from here. I'm using a recursive function to scan all the directories and we're talking about 20 thousand files more or less.
I've solved the EMFILE: too many files open
issue through graceful-fs but now I've landed to a new issue: JavaScript heap out of memory
.
Below is the complete output which I receive:
C:\Users\User\Documents\GitHub\mp3manager>npm run scan
> experiments@1.0.0 scan C:\Users\User\Documents\GitHub\mp3manager
> cross-env NODE_ENV=production NODE_OPTIONS='--max-old-space-size=4096' node scripts/cli/mm scan D:\Musica
Scanning 1 resources in production mode
Trying to connect to mongodb://localhost:27017/music_manager
Connected to mongo...
<--- Last few GCs --->
[16744:0000024DD9FA9F40] 141399 ms: Mark-sweep 63.2 (70.7) -> 63.2 (71.2) MB, 47.8 / 0.1 ms (average mu = 0.165, current mu = 0.225) low memory notification GC in old space requested
[16744:0000024DD9FA9F40] 141438 ms: Mark-sweep 63.2 (71.2) -> 63.2 (71.2) MB, 38.9 / 0.1 ms (average mu = 0.100, current mu = 0.001) low memory notification GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x02aaa229e6e9 <JSObject>
0: builtin exit frame: new ArrayBuffer(aka ArrayBuffer)(this=0x027bb3502801 <the_hole>,0x0202be202569 <Number 8.19095e+06>,0x027bb3502801 <the_hole>)
1: ConstructFrame [pc: 000002AF8F50D385]
2: createUnsafeArrayBuffer(aka createUnsafeArrayBuffer) [00000080419526C9] [buffer.js:~115] [pc=000002AF8F8440B1](this=0x027bb35026f1 <undefined>,size=0x0202be202569 <Number 8.19095e+06>)
3:...
FATAL ERROR: Committing semi space failed. Allocation failed - JavaScript heap out of memory
1: 00007FF6E36FF04A
2: 00007FF6E36DA0C6
3: 00007FF6E36DAA30
4: 00007FF6E39620EE
5: 00007FF6E396201F
6: 00007FF6E3E82BC4
7: 00007FF6E3E79C5C
8: 00007FF6E3E7829C
9: 00007FF6E3E77765
10: 00007FF6E3989A91
11: 00007FF6E35F0E52
12: 00007FF6E3C7500F
13: 00007FF6E3BE55B4
14: 00007FF6E3BE5A5B
15: 00007FF6E3BE587B
16: 000002AF8F55C721
npm ERR! code ELIFECYCLE
npm ERR! errno 134
I've tried to use NODE_OPTIONS='--max-old-space-size=4096'
but I'm not even sure that Node is considering this option on Windows. I've tried p-limit to limit the number of promises effectively running, but honestly, I'm a bit out of new ideas now and I'm starting thinking to use another language to see if it can cope better with these kinds of issues.
Any advice would be appreciated.
Have a nice day.
EDIT:
I tried to substitute the processDir
function with the one posted by @Terry but the result it's the same.
Update 2019-08-19: In order to avoid the heap issues, I removed the recursion and used a queue to add the directories:
const path = require('path');
const mm = require('music-metadata');
const _ = require('underscore');
const fs = require('graceful-fs');
const readline = require('readline');
const audioType = require('audio-type');
// const util = require('util');
const { promisify } = require('util');
const logger = require('../logger');
const { mp3hash } = require('../../../src/libs/utils');
const MusicFile = require('../../../src/models/db/mongo/music_files');
const getStats = promisify(fs.stat);
const readdir = promisify(fs.readdir);
const readFile = promisify(fs.readFile);
// https://github.com/winstonjs/winston#profiling
class MusicScanner {
constructor(options) {
const { paths, keepInMemory } = options;
this.paths = paths;
this.keepInMemory = keepInMemory === true;
this.processResult = {
totFiles: 0,
totBytes: 0,
dirQueue: [],
};
}
async processFile(resource) {
const buf = await readFile(resource);
const fileRes = audioType(buf);
if (fileRes === 'mp3') {
this.processResult.totFiles += 1;
// process the metadata
this.processResult.totBytes += fileSize;
}
}
async processDirectory() {
while(this.processResult.dirQueue.length > 0) {
const dir = this.processResult.dirQueue.shift();
const dirents = await readdir(dir, { withFileTypes: true });
const filesPromises = [];
for (const dirent of dirents) {
const resource = path.resolve(dir, dirent.name);
if (dirent.isDirectory()) {
this.processResult.dirQueue.push(resource);
} else if (dirent.isFile()) {
filesPromises.push(this.processFile(resource));
}
}
await Promise.all(filesPromises);
}
}
async scan() {
const promises = [];
const start = Date.now();
for (const thePath of this.paths) {
this.processResult.dirQueue.push(thePath);
promises.push(this.processDirectory());
}
const paths = await Promise.all(promises);
this.processResult.paths = paths;
return this.processResult;
}
}
module.exports = MusicScanner;
The problem here is that the process takes 54 minutes to read 21K files and I'm not sure how I could speed up the process in this case. Any hints on that?