3

In a previous question it seemed that the only way to do random-access reads from a file in node.js is to use fs.createReadStream() with its optional fd, start, and end fields.

This worked fine in my simplest tests. But in my project I need to repeatedly read from different offsets of a binary file. This failed in a strange way so I came up with a minimal test case:

var fs = require('fs');

fs.open('test.txt', 'r', function (err, fd) {
  if (err) {
    console.error('error opening file: ' + err);
  } else {
    fs.createReadStream(null, {fd: fd, start: 2, end: 5}).on('error', function (err) {
        throw e;
      }).on('close', function () {
        console.log('outer close');
      }).on('data', function (data) {
        console.log('outer data', data);
      }).on('end', function () {
        console.log('outer end');

        fs.createReadStream(null, {fd: fd, start: 0, end: 3}).on('error', function (err) {
            throw e;
          }).on('close', function () {
            console.log('inner close');
          }).on('data', function (data) {
            console.log('inner data', data);
          }).on('end', function () {
            console.log('inner end');

            // more code to execute after both reads
          });
      });
  }
});

The inner end event is never received. (The outer close is received inconsistently, but I don't need to attach code to it.)

I've implemented this project before in Perl and even in JavaScript as a Firefox extension, but it's proving difficult under node. This is also a test for whether I can start using node.js as a general purpose scripting language.

Community
  • 1
  • 1
hippietrail
  • 15,848
  • 18
  • 99
  • 158

1 Answers1

3

The issue is that the outer ReadStream will close the fd after it is used, so reusing it on the second ReadStream will fail. The newest Node unstable actually has an autoClose options for ReadStreams but that is not part of stable yet.

The real answer that is that the information given to you in your previous question is incorrect. createReadStream is implemented using all public APIs, so there is nothing that it can do that you can't do too. In this case, you can just use fs.read with its position argument.

var fs = require('fs');                                                         

fs.open('test.txt', 'r', function (err, fd) {                                   
  if (err) {                                                                    
    console.error('error opening file: ' + err);                                
  } else {                                                                      
    fs.read(fd, new Buffer(4), 0, 4, 2, function(err, bytesRead, data){        
      if (err) throw err;                                                       
      console.log('outer data', data);                                          

      fs.read(fd, new Buffer(3), 0, 3, 0, function(err, bytesRead, data2){   
        if (err) throw err;                                                     
        console.log('inner data', data2);                                       
        fs.close(fd);                                                           

        // more code to execute after both reads                                
      });                                                                       
    });                                                                         
  }                                                                             
});   
loganfsmyth
  • 156,129
  • 30
  • 331
  • 251
  • Thanks @loganfsmyth! I expected an option like this in `fs.read()` but somehow when I read the docs for it I managed to conflate the `offset` and `position` parameters and only noticed `offset` is offset within the buffer where reading will start! It would be great if you would contribute an answer also to the previous quesiton. – hippietrail Dec 24 '12 at 04:55
  • `fs.read` works well with this problem, but my subsequent problem required both random access and reading a line at a time. Line reading is much more readily achieved with streams but seeking can only be done with `fs.read`. I can't find a solution that allows mixing the two with simple code. – hippietrail Jan 02 '13 at 14:02
  • @hippietrail How do streams make finding newlines easier? Streams are implemented with `fs.read` so you should be able to replicate any logic that it has. – loganfsmyth Jan 02 '13 at 16:20
  • Streams don't make finding newlines easier, they make keeping multibyte sequences unbroken easier. Are you saying I should replicate the logic in `Streams` for my own line-reading code with `fs.read`? Software engineers avoid reinventing the wheel. I looked and the code to handle UTF-8 sequences etc is indeed not particularly trivial. To me this is a mark against node.js at this point as a general purpose scripting language. I'm aware that wasn't the goal of node, but in so many other ways it has the potential to do what Perl/Python/Ruby have been doing, but better. – hippietrail Jan 03 '13 at 07:18
  • @hippietrail UTF8 multibyte logic is already available in the String Decoder node module. http://nodejs.org/api/string_decoder.html – loganfsmyth Jan 03 '13 at 16:09
  • Yes the problem is when you `fs.read` two fixed-length buffers, one may end with the first half of a UTF-8 sequence and the next one might begin with the other half. So you can't use the string decoder on each half separately. You would have to analyse the first and last several octects in each chunk to see which substring to decode without corruption or how to splice them into new buffers with only complete UTF-8 characters before decoding them. – hippietrail Jan 04 '13 at 01:13
  • 1
    @hippietrail The docs are not super clear on it, but that is what StringDecoder does. If you pass it a buffer that ends in a partial character, it will return all the valid characters before that, and buffer the partial character to prepend to whatever is passed to `write()` next. – loganfsmyth Jan 04 '13 at 03:41
  • Oh thanks for the tip! I'll make it my highest priority hacking task to check that out! – hippietrail Jan 04 '13 at 09:57
  • You're right, the docs and even the source for `StringDecoder` are quite opaque. I don't think I would've realized its utility without your help. I get the impression some of the English in the source was written by non-native speakers. The mention of `CESU-8` is a bit worrying too. – hippietrail Jan 05 '13 at 11:49