6

I have a Web server that reads and writes to a data file on disk. I'd like a file only be written to in a single Web request.

Here's an example program that illustrates my problem. It keeps a state file in "/tmp/rw.txt" and increments the integer contents on each Web hit. Running this program, and then running something like ab -n 10000 -c 1000 http://localhost:3000/, shows that the same value is read from the file by multiple hits, and it's written multiple times.

NOTE: I know about flock() and fs-ext. However, flock() will lock the file to the current process; since all the access here is in the same process, flock() doesn't work (and complicates this example considerably).

Also note that I'd usually use express, async, etc. to get most of this done; just sticking to the basics for the sake of example.

var http = require("http"),
    fs = require("fs");

var stateFile = "/tmp/rw.txt";

var server = http.createServer(function(req, res) {

    var writeNum = function(num) {
        var ns = num.toString(10);
        console.log("Writing " + ns);
        fs.writeFile(stateFile, ns, function(err) {
            if (err) {
                res.writeHead(500, {"Content-Type": "text/plain"});
                res.end(err.message);
            } else {
                res.writeHead(200, {"Content-Type": "text/plain"});
                res.end(ns);
            }
        });
    };

    switch (req.url) {
    case "/reset":
        writeNum(0);
        break;
    case "/":
        fs.readFile(stateFile, function(err, data) {
            if (err && err.code == "ENOENT") {
                // First time, set it to zero
                writeNum(0);
            } else if (err) {
                res.writeHead(500, {"Content-Type": "text/plain"});
                res.end(err.message);
            } else {
                writeNum(parseInt(data, 10) + 1);
            }
        });
        break;
    default:
        res.writeHead(404, {"Content-Type": "text/plain"});
        res.end("No such resource: " + req.url);
    }
});

server.listen(3000);
Evan P.
  • 979
  • 1
  • 10
  • 17
  • This case is not related to node.js. It's a multithread feature. You cannot guarantee that the next request will read from the file the value that stored the previous request. The second request wont be aware that a previous request is uptading the file if the time between them is less than a millisecond or similar (less than a disk access). You could write a queue or something similar but it's not necessary, all the internet works this way because accessing a disk is sloooowwww. – Gabriel Llamas Nov 03 '12 at 12:31
  • @GabrielLlamas Did you try running the example code? It's written in node.js and illustrates the problem. Most multi-threaded systems have locking mechanisms like semaphores, mutexes, and synchronized sections built-in. Right now I think the best way to do this is an in-process locking mechanism. – Evan P. Nov 03 '12 at 14:31

5 Answers5

3

Storing data in files is not a preferred way in multi-user environment like web server. Databases are more suitable for this. But if you really want to stick with file, I suggest to use a buffer. That is, a memory object to which you write/read, and a separate function that periodically dumps its content to the disk, like this:

var server = http.createServer(function(req, res) {
    ----- cut ----
    switch (req.url) {
        case "/reset":
            value = 0;
            break;
        case "/":
            value ++;
            break;
        default:
            res.writeHead(404, {"Content-Type": "text/plain"});
            res.end("No such resource: " + req.url);
        }
    }
    ----- cut ----
}

(function() {
    var cb = arguments.callee;
    fs.writeFile(stateFile, value, function(err) {
        // handle error
        setTimeout(cb, 100); // schedule next call
    });
})();
Dmitry
  • 289
  • 1
  • 4
  • The example code is an example that shows the problem -- non-atomic access to a resource in async code. That resource could be a database record, ("SELECT" then "UPDATE"), a file, or even an in-memory resource. – Evan P. Nov 05 '12 at 15:56
  • Before voting down on my answer, have you tried to re-read your question and find where you mention anything but abstract file read-write issue? You need to learn how to express a problem in a clear way. – Dmitry Nov 05 '12 at 19:56
  • 2
    I did express the problem in a clear way. You made the typical mistake of trying to re-implement my example code in some smarter way. I don't need an application to increment a number. I need to synchronize access to a file across multiple asynchronous requests, which is exactly what the question here says. Answer the question as it's given, rather than trying to outsmart the questioner. – Evan P. Nov 06 '12 at 15:35
2

And another way to do this (in case there is something behind your question so you don't accept simple solutions ;)) is to create a processing queue. Below is a simple queue pattern, it executes requests in the order they submitted and returns error (if any) to the provided callback once the function is executed.

var Queue = function(fn) {
  var queue = [];
  var processingFn = fn;

  var iterator = function(callback) {
      return function(err) {
        queue.shift();  // remove processed value
        callback(err);

        var next = queue[0];
        if(next)
          processingFn(next.arg, iterator(next.cb));
    };
  }

  this.emit = function(obj, callback) {
    var empty = !queue.length;
    queue.push({ arg: obj, cb: callback});

    if(empty) { // start processing
      processingFn(obj, iterator(callback));
    }
  }
}



function writeNum(inc, continueCb) {
  fs.readFile(stateFile, function(err, data) {
    if(err)
      return continueCb(err);

    var value = data || 0;
    fs.writeFile(stateFile, value + inc, function(err) {
      continueCb(err);
    });
  });
}


var writer = new Queue(writeNum);

// on each request
writer.emit(1, function(err) {
  if(err) {
    // respond with error
  }
  else {
    // value written
  }
});
Dmitry
  • 289
  • 1
  • 4
  • I don't think using a queue works, since you'll queue up a bunch of reads and writes like: read, read, read, read, write, write, write, write, read, write, ... ...and really what you want is: read, write, read, write, read, write, read, write, ... – Evan P. Nov 05 '12 at 15:58
  • Have you tried it? It solves problem in your example in the way you wanted it: read, then write, then move to the next queue item. – Dmitry Nov 05 '12 at 19:58
2

I wasn't able to find a library that did what I want, so I created one here:

https://npmjs.org/package/schlock

Here's the above example program using the read/write locking. I also used "Step" to make the whole thing more readable.

var http = require("http"),
    fs = require("fs"),
    Schlock = require("schlock"),
    Step = require("step");

var stateFile = "/tmp/rw.txt";

var schlock = new Schlock();

var server = http.createServer(function(req, res) {

    var num;

    Step(
        function() {
            schlock.writeLock(stateFile, this);
        },
        function(err) {
            if (err) throw err;
            fs.readFile(stateFile, this);
        },
        function(err, data) {
            if (err && err.code == "ENOENT") {
                num = 0;
            } else if (err) {
                throw err;
            } else {
                num = parseInt(data, 10) + 1;
            }
            fs.writeFile(stateFile, num.toString(10), this);
        },
        function(err) {
            if (err) throw err;
            schlock.writeUnlock(stateFile, this);
        },
        function(err) {
            if (err) {
                res.writeHead(500, {"Content-Type": "text/plain"});
                res.end(err.message);
            } else {
                res.writeHead(200, {"Content-Type": "text/plain"});
                res.end(num.toString(10));
            }
        }
    );
});

server.listen(3000);
Evan P.
  • 979
  • 1
  • 10
  • 17
1

You can use fs.writeFileSync and readFileSync.

OneOfOne
  • 95,033
  • 20
  • 184
  • 185
  • The synchronous functions must be removed from the node.js because help to use node.js incorrectly. If you write a node.js app then all the app needs to be asynchronous, otherwise you'll losing all the benefits of using node... – Gabriel Llamas Nov 03 '12 at 12:18
  • @OneOfOne That's definitely one solution. The problem is that it locks up the entire app. Although in this example every hit writes the resource, you could imagine a program with hundreds of thousands of files (say, a wiki which stores each page in a markdown file). That makes all requests wait for a resource that they (probably) are not using. – Evan P. Nov 03 '12 at 14:33
  • A web app should never block, should never use any lock system. Specially in node.js blocking the event loop is forbidden. – Gabriel Llamas Nov 03 '12 at 15:11
  • Only other solution is to handle the writing in a different process that can block, there aren't that many options to go with here to do what you want. – OneOfOne Nov 03 '12 at 17:57
1

One solution I just found via npm search is the locker server: https://github.com/bobrik/locker .

I think it's a good solution to the problem, and it's about how I'd design it.

The big problem is that it handles the general case (multiple processes using a resource) which requires a separate server.

I'm still looking for an in-process solution that does about the same thing.

Evan P.
  • 979
  • 1
  • 10
  • 17