1

I was trying to write a program that takes all of the files that are big enough for gzip and compresses them using the gzip part of the zlib module and i came across the same error described in this question the error being node.js ran out of file descriptors and therefore is unable to open any other files. and in that same question it describes fixing it by increasing the number of file descriptors. in trying to do this though ive come across a couple of questions that i can't find the answer to.

  1. Are file descriptors shared between parent and child processes? meaning could we fix this error simply by creating a new child process for programs that use a lot of file descriptors? does the type of child process matter?
  2. How many file descriptors do processes like zlib use? in my program i was trying to zip 1695 files but 673 failed i know that each file has at least 2 file descriptors (1 for the readStream and 1 for the writeStream)but the limit is far above that so how many does the zlib itself create?
  3. is there any way of changing the file descriptor limit inside a node.js javascript file? or can it only be changed externally?
  4. can the limit be changed from command line parameters so that it can be application specific?
  5. is it possible to monitor how many file descriptors are currently in use? that might allow you to slow down the creation of new read/write stream calls allowing older processes to complete freeing up file descriptors. preferably a way within node.js itself so it can be easily integrated into a node javascript file

more for example purposes here is the code for my program

var errors=0;
function compressFile(file,type){
    if(type.indexOf('gzip')>=0){
        fs.stat(file,function(err,stat){
            if(!err){
                if(stat.size>1000){
                    var gzip=zlib.createGzip();
                    var compiled=fs.createReadStream(file,{autoclose:true}).on('error',function(err){
                        console.log(file,1);
                        //console.log(err);
                    });
                    var compressed=fs.createWriteStream(file+'.gz',{autoclose:true}).on('error',function(err){
                        console.log(file,2);
                        errors++
                        //console.log(err);
                        console.log(errors);
                    });
                    compiled.pipe(gzip).pipe(compressed);
                }
            }else{
                console.log(err);
            }
        });
    }else{
        console.log('not supported compression');
    }
}
function compressAll(){
    fs.readdir('./',function(err,files){
        if(err){
            console.log(err);
        }else{
            for(i=0;i<files.length;i++){
                var stat=fs.statSync('./'+files[i]);
                if(stat.isDirectory()){
                    var subfiles=fs.readdirSync(files[i]);
                    subfiles=subfiles.map(function(value){
                        return files[i]+'/' +value;
                    });
                    files.splice(i,1);
                    Array.prototype.push.apply(files,subfiles);
                    i--;
                }else if(stat.size<1000){
                    console.log(files[i],stat.size);
                    files.splice(i,1);
                    i--;
                }else if(path.parse(files[i]).ext==='.gz'){
                    files.splice(i,1);
                    i--;
                }else{
                    compressFile(files[i],compress);
                }
            }
            console.log(files.length);
        }
    });
}

as i said before i attempted to run 1695 files through this and received 673 errors so its running out of file descriptors somewhere around 1000 files being zipped

update from my new understanding of how file descriptors relates to the OS i see that my questions, 1,3, and 4 don't apply to node.js however im still wondering on 2 and 5. how many does zlib use and is there a way to monitor file descriptors?

Community
  • 1
  • 1
Binvention
  • 1,057
  • 1
  • 8
  • 24
  • 1
    Note that this has nothing to do with NodeJS. It's an OS thing. That might help you find answers. – T.J. Crowder Jan 27 '16 at 17:05
  • 1
    it is os specific but node.js imposes its own limit to prevent file descriptors leaks. so im asking about the node.js use of the os file descriptors – Binvention Jan 27 '16 at 17:07
  • What makes you think Node imposes its own limit? – T.J. Crowder Jan 27 '16 at 17:08
  • https://github.com/nodejs/node/blob/6fff47ffacfe663efeb0d31ebd700a65bf5521ba/doc/tsc-meetings/2015-06-10.md look under the subject node should not automatically change rlimits. – Binvention Jan 27 '16 at 17:11
  • @Binvention: That describes arguments for and against node ignoring OS imposed soft limits on file descriptors. The limits are still OS limits. Google ulimit and rlimit. They are not node specific. – slebetman Jan 27 '16 at 17:19
  • What OS are you running on? – slebetman Jan 27 '16 at 17:20
  • windows 8 and some of the questions still apply to node like is there a way to monitor how many file descriptors are left – Binvention Jan 27 '16 at 17:24
  • Okay, so that's not Node applying its own limit, that's Node enforcing the soft limit rather than the hard limit from the OS. – T.J. Crowder Jan 27 '16 at 17:25
  • yes thank both of you for that information that seemed confusing. my questions 2 and 5 do still apply though is there any help i could get on those? – Binvention Jan 27 '16 at 17:28

1 Answers1

3

In general, the questions you have seem a bit random, so you might need to try and learn more about file descriptors in general. Also, Windows does not have file descriptors as such, so anything which speaks about file descriptors actually means something else on Windows.

But, to answer your questions directly:

2) If you mean node.js built-in zlib class, then that does not use file descriptors at all. If you mean just generically starting an external process, then by default node.js creates a pipe for each of stdin, stdout, stderr. This means that momentarily it will create 6 file descriptors, but 3 of them will be closed by the parent process - so 3 file descriptors per external process.

5) You can see all open file descriptors for a process in unix systems by doing fs.readdirSync("/proc/self/fd"). However, since you seem to be on Windows, this will not help you and I'm not the right person to know if node.js wraps some usable API on Windows.

The example code you have written creates two file descriptors per compressed file and no more. The solution is not to gzip them all in parallel (which is horribly inefficient anyway), but instead decide on a reasonable degree of parallelism and run only that many compressions in parallel.

Nakedible
  • 4,067
  • 7
  • 34
  • 40
  • Gzip and other zlib functions run through the stream api and thus have file descriptors as well https://nodejs.org/dist/latest-v5.x/docs/api/stream.html#stream_class_stream_duplex – Binvention Jan 27 '16 at 21:59
  • No, the streaming API is implemented entirely inside node.js - there is no operating system file descriptor. – Nakedible Jan 27 '16 at 22:08
  • But it uses the operating systems stdio interface which does require file descriptors. – Binvention Jan 27 '16 at 22:09
  • No it does not, it is simply a readable stream inside node.js, with the implementation of the operations provided by zlib: http://www.zlib.net/manual.html – Nakedible Jan 27 '16 at 22:10
  • `$ node -e 'var fs = require("fs"); var zlib = require("zlib"); var x = JSON.stringify(fs.readdirSync("/proc/self/fd")); var gzip = zlib.createGzip(); var y = JSON.stringify(fs.readdirSync("/proc/self/fd")); console.log(x == y);'` results in `true`. – Nakedible Jan 27 '16 at 22:15
  • I believe it only creates the stream when you pipe data into the gzip – Binvention Jan 27 '16 at 22:17
  • No, it does not. This is pointless - if you don't want to believe me then don't. See the source for yourself, if you wish: https://github.com/nodejs/node/blob/master/lib/zlib.js – Nakedible Jan 27 '16 at 22:22
  • It isn't necessarily that I don't believe you. I'm just trying to see how the node.js processes align with the system processes all of the references you've given are directed towards the JavaScript Api but the JavaScript api is built off of system processes. So if the buffer api uses the systems stdio to work then it uses up some of that resource. And trying to find where and if it connects is quite hard when I don't know exactly what I'm looking for. – Binvention Jan 27 '16 at 22:51
  • Because when microsofts stream and file descriptor api https://msdn.microsoft.com/en-us/library/k3352a4t.aspx sounds so similar to the node.js stream api it makes me think that possibly somewhere the node.js stream api depends on the systems io like the stream and file descriptors (including their limits) – Binvention Jan 27 '16 at 22:58
  • Buffers are unrelated to file descriptors. File descriptors are created by system calls such as `open()`, `pipe()`, `dup()`, none of which are used by the calls in the zlib module. Node.js `Readable` and `Writable` stream APIs do not create file descriptors as they are just javascript constructs for passing data around, where as `fs.createReadStream()` will as the latter will use `open()` internally. And none of this is directly translatable to Windows as it has different concepts for the same thing. – Nakedible Jan 27 '16 at 22:58
  • But pipe is used in the zlib module you pipe the file to zlib and zlib pipes the data to the write stream – Binvention Jan 27 '16 at 23:00
  • `Readable.pipe()` has nothing to do with `pipe(2)`: http://linux.die.net/man/2/pipe – Nakedible Jan 27 '16 at 23:01
  • Your linked description has references to include stdio.h which is how you include the operating systems stdio interface which is part of the stream and file descriptor I referenced earlier. Look at the requirements listed here https://msdn.microsoft.com/en-us/library/xt874334.aspx – Binvention Jan 27 '16 at 23:06
  • I should've stopped talking earlier already - you don't seem to have any wish to accept what is being said to you, instead you come up with counter arguments which is a circle that can go on forever. Node.js Stream API does not use file descriptors, or `stdio.h` for that matter. Node.js Zlib module does not use file descriptors. Node.js FS module does use file descriptors, naturally. – Nakedible Jan 27 '16 at 23:10
  • No I wish to accept what's being said to me I'm just not sure they are as unrelated as you think. After all I imagine in order to help zlib process large files it needs to create temporary files since a raw buffer can only handle so much. So it isn't entirely impossible for zlib to then require file descriptors. I don't know that it does but it seems likely to me. And since stdio and file descriptors are all part of the system io and both have limited resources it seems likely that they are more connected then you think. – Binvention Jan 27 '16 at 23:16