17

I'm trying to implement a routine for Node.js that would allow one to open a file, that is being appended to by some other process at this very time, and then return chunks of data immediately as they are appended to file. It can be thought as similar to tail -f UNIX command, however acting immediately as chunks are available, instead of polling for changes over time. Alternatively, one can think of it as of working with a file as you do with socket — expecting on('data') to trigger from time to time until a file is closed explicitly.

In C land, if I were to implement this, I would just open the file, feed its file descriptor to select() (or any alternative function with similar designation), and then just read chunks as file descriptor is marked "readable". So, when there is nothing to be read, it won't be readable, and when something is appended to file, it's readable again.

I somewhat expected this kind of behavior for following code sample in Javascript:

function readThatFile(filename) {
    const stream = fs.createReadStream(filename, {
        flags: 'r',
        encoding: 'utf8',
        autoClose: false // I thought this would prevent file closing on EOF too
    });

    stream.on('error', function(err) {
        // handle error
    });

    stream.on('open', function(fd) {
        // save fd, so I can close it later
    });

    stream.on('data', function(chunk) {
        // process chunk
        // fs.close() if I no longer need this file
    });
}

However, this code sample just bails out when EOF is encountered, so I can't wait for new chunk to arrive. Of course, I could reimplement this using fs.open and fs.read, but that somewhat defeats Node.js purpose. Alternatively, I could fs.watch() file for changes, but it won't work over network, and I don't like an idea of reopening file all the time instead of just keeping it open.

I've tried to do this:

const fd = fs.openSync(filename, 'r'); // sync for readability' sake
const stream = net.Socket({ fd: fd, readable: true, writable: false });

But had no luck — net.Socket isn't happy and throws TypeError: Unsupported fd type: FILE.

So, any solutions?

UPD: this isn't possible, my answer explains why.

toriningen
  • 7,196
  • 3
  • 46
  • 68

4 Answers4

0

I haven't looked into the internals of the read streams for files, but it's possible that they don't support waiting for a file to have more data written to it. However, the fs package definitely supports this with its most basic functionality.

To explain how tailing would work, I've written a somewhat hacky tail function which will read an entire file and invoke a callback for every line (separated by \n only) and then wait for the file to have more lines written to it. Note that a more efficient way of doing this would be to have a fixed size line buffer and just shuffle bytes into it (with a special case for extremely long lines), rather than modifying JavaScript strings.

var fs = require('fs');

function tail(path, callback) {
  var descriptor, bytes = 0, buffer = new Buffer(256), line = '';

  function parse(err, bytesRead, buffer) {
    if (err) {
      callback(err, null);
      return;
    }
    // Keep track of the bytes we have consumed already.
    bytes += bytesRead;
    // Combine the buffered line with the new string data.
    line += buffer.toString('utf-8', 0, bytesRead);
    var i = 0, j;
    while ((j = line.indexOf('\n', i)) != -1) {
      // Callback with a single line at a time.
      callback(null, line.substring(i, j));
      // Skip the newline character.
      i = j + 1;
    }
    // Only keep the unparsed string contents for next iteration.
    line = line.substr(i);
    // Keep reading in the next tick (avoids CPU hogging).
    process.nextTick(read);
  }

  function read() {
    var stat = fs.fstatSync(descriptor);
    if (stat.size <= bytes) {
      // We're currently at the end of the file. Check again in 500 ms.
      setTimeout(read, 500);
      return;
    }
    fs.read(descriptor, buffer, 0, buffer.length, bytes, parse);
  }

  fs.open(path, 'r', function (err, fd) {
    if (err) {
      callback(err, null);
    } else {
      descriptor = fd;
      read();
    }
  });

  return {close: function close(callback) {
    fs.close(descriptor, callback);
  }};
}

// This will tail the system log on a Mac.
var t = tail('/var/log/system.log', function (err, line) {
  console.log(err, line);
});

// Unceremoniously close the file handle after one minute.
setTimeout(t.close, 60000);

All that said, you should also try to leverage the NPM community. With some searching, I found the tail-stream package which might do what you want, with streams.

Blixt
  • 49,547
  • 13
  • 120
  • 153
  • 1
    Well, your proposed solution suffers from timer granularity — if you poll too often, you will drain too much CPU, and otherwise you lose "realtimeness", as updates arrive with lag. I mentioned `select()` in my question for that reason — it does not need to wait on timeout, so it can just indefinitely block until there is something available to read, or, for real use cases, until other event wakes it — and process that immediately. And, also to mention, your approach is what I called "reimplement using `fs.open`/`fs.read`" — elegant solution will require dealing with node.js reactor directly. – toriningen Apr 28 '15 at 16:18
  • And regarding NPM community — I'm in search for *idea* in other's solutions for now, because examining sources show they use either approach similar to yours, either `fs.watch`, either just try to `fs.read` in `setInterval`... Maybe I hadn't been thorough enough, but yet I have to see nice solution — or finally write it myself after receiving answers :) – toriningen Apr 28 '15 at 16:22
  • @modchan: `setTimeout` should be fine in this case. You can rarely get into sub-ms accuracy with file system watching (and I see no reason to do so – file systems for real-time communication is bad), and your provided example `tail -f` will actually poll every 1 second IIRC, so even 500 ms is a step up. – Blixt Apr 28 '15 at 16:27
  • `tail -f` is just an example of what I'm trying to accomplish, there is no goal to exactly mimick it's behavior. I agree that file system is bad for real-time communication, however that is only true when you actually involve file system. In my proposed usecase of some growing file, other application would get access to new data in fact bypassing filesystem, much like it's done with pipes or UNIX sockets — as long as you don't `stat()` it, reopen and involve filesystem by any means. – toriningen Apr 28 '15 at 16:33
  • 1
    Or, in other words — if C can do this exactly this way, and Node.js uses `select()`, `kqueue()` and friends under the hood any way, why wouldn't it be possible to do that Node.js-way, retaining same concept? :) – toriningen Apr 28 '15 at 16:37
  • 1
    I think I understand what you want now, but your question isn't really phrased that way. When you use `tail -f` as an example and asking for how not to stop on EOF, you seem to be asking for the solution provided by e.g., `tail-stream` (keep reading new data as it's added to the file, without closing and re-opening the file). So if I understand you correctly you're looking for the most optimized way of monitoring the file system events so that you may only attempt to read when there's actually new data available. This seems to be more performance oriented. Maybe you can clarify your question? – Blixt Apr 28 '15 at 17:48
  • I've updated question so it should be now more clear what I look for. – toriningen Apr 28 '15 at 19:04
0

Previous answers have mentioned tail-stream's approach which uses fs.watch, fs.read and fs.stat together to create the effect of streaming the contents of the file. You can see that code in action here.

Another, perhaps hackier, approach might be to just use tail by spawning a child process with it. This of course comes with the limitation that tail must exist on the target platform, but one of node's strengths is using it to do asynchronous systems development via spawn and even on windows, you can execute node in an alternate shell like msysgit or cygwin to get access to the tail utility.

The code for this:

var spawn = require('child_process').spawn;

var child = spawn('tail',
    ['-f', 'my.log']);

child.stdout.on('data',
    function (data) {
        console.log('tail output: ' + data);
    }
);

child.stderr.on('data',
    function (data) {
        console.log('err data: ' + data);
    }
);
j03m
  • 5,195
  • 4
  • 46
  • 50
  • Node strength is that you have to bundle non-portable C utility with your .js app, instead of "just solving the problem"? :) – toriningen Jul 14 '15 at 03:00
  • Haha good point, "one of" the strengths is easily working with system utilities, but sure you have a point. Tail-stream is a pretty clear example of how to accomplish what you're after. You could also write a native module? *ducks* – j03m Jul 14 '15 at 11:55
0

So, it seems people are still looking for an answer to this question for five years already, and there is yet no answer on topic.

In short: you can't. Not in Node.js particularly, you can't at all.

Long answer: there are few reasons for this.

First, POSIX standard clarifies select() behavior in this regard as follows:

File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions.

So, select() can't help with detecting a write beyond the file end.

With poll() it's similar:

Regular files shall always poll TRUE for reading and writing.

I can't tell for sure with epoll(), since it's not standartized and you have to read quite lengthy implementation, but I would assume it's similar.

Since libuv, which is in core of Node.js implementation, uses read(), pread() and preadv() in its uv__fs_read(), neither of which would block when invoked at the end of file, it would always return empty buffer when EOF is encountered. So, no luck here too.

So, summarizing, if such functionality is desired, something must be wrong with your design, and you should revise it.

toriningen
  • 7,196
  • 3
  • 46
  • 68
-1

What you're trying to do is a FIFO file (acronym for First In First Out), which as you said works like a socket.

There's a node.js module that allows you to work with fifo files.

I don't know what do you want that for, but there are better ways to work with sockets on node.js. Try socket.io instead.

You could also have a look at this previous question: Reading a file in real-time using Node.js

Update 1

I'm not familiar with any module that would do what you want with a regular file, instead of with a socket type one. But as you said, you could use tail -f to do the trick:

// filename must exist at the time of running the script
var filename = 'somefile.txt';

var spawn = require('child_process').spawn;
var tail = spawn('tail', ['-f', filename]);

tail.stdout.on('data', function (data) {
    data = data.toString().replace(/^[\s]+/i,'').replace(/[\s]+$/i,'');
    console.log(data);
});

Then from the command line try echo someline > somefile.txt and watch at the console.

You might also would like to have a look at this: https://github.com/layerssss/node-tailer

Community
  • 1
  • 1
tin
  • 834
  • 6
  • 16