13

I've got two node threads running, one watching a directory for file consumption and another that is responsible for writing files to given directories.

Typically they won't be operating on the same directory, but for an edge case I'm working on they will be.

It appears that the consuming app is grabbing the files before they are fully written, resulting in corrupt files.

Is there a way I can lock the file until the writing is complete? I've looked into the lockfile module, but unfortunately I don't believe it will work for this particular application.

=====

The full code is far more than makes sense to put here, but the gist of it is this:

  1. App spins off the watchers and listener

Listener:

  • listen for file being added to db, export it using fs.writeFile

Watcher:

  • watcher uses chokidar to track added files in each watched directory
  • when found fs.access is called to ensure we have access to the file
    • fs.access seems to be unfazed by file being written
  • the file is consumed via fs.createReadStream and then sent to server
    • filestream is necessary as we need the file hash

In this case the file is exported to the watched directory and then reimported by the watch process.

andrew.carpenter
  • 516
  • 2
  • 6
  • 14
  • 1
    Are these 2 separate programs, or is it just 2 different functions? – Datsik Feb 25 '16 at 01:07
  • 1
    While it is one application, the watching class has no knowledge of the other. Also, they will be moved into spawned threads once it is fully implemented – andrew.carpenter Feb 25 '16 at 01:11
  • 1
    If you could provide some code that would be great as well. People tend to downvote things with no code. – Datsik Feb 25 '16 at 01:12

3 Answers3

15

I'd use proper-lockfile for this. You can specify an amount of retries or use a retry config object to use an exponential backoff strategy. That way you can handle situations where two processes need to modify the same file at the same time.

Here's a simple example with some retry options:

const lockfile = require('proper-lockfile');
const Promise = require('bluebird');
const fs = require('fs-extra');
const crypto = require('crypto'); // random buffer contents

const retryOptions = {
    retries: {
        retries: 5,
        factor: 3,
        minTimeout: 1 * 1000,
        maxTimeout: 60 * 1000,
        randomize: true,
    }
};

let file;
let cleanup;
Promise.try(() => {
    file = '/var/tmp/file.txt';
    return fs.ensureFile(file); // fs-extra creates file if needed
}).then(() => {
    return lockfile.lock(file, retryOptions);
}).then(release => {
    cleanup = release;

    let buffer = crypto.randomBytes(4);
    let stream = fs.createWriteStream(file, {flags: 'a', encoding: 'binary'});
    stream.write(buffer);
    stream.end();

    return new Promise(function (resolve, reject) {
        stream.on('finish', () => resolve());
        stream.on('error', (err) => reject(err));
    });
}).then(() => {
    console.log('Finished!');
}).catch((err) => {
    console.error(err);
}).finally(() => {
    cleanup && cleanup();
});
DJDaveMark
  • 2,669
  • 23
  • 35
4

Writing a lock-state system is actually pretty simple. I can't find where I did this, but the idea is to:

  1. create lock files whenever you acquire a lock,
  2. delete them when releasing a lock,
  3. delete them after a timeout has occurred,
  4. throw whenever requesting a lock for a file whose lock file already exists.

A lock file is simply an empty file in a single directory. Each lock file gets its name from the hash of the full path of the file it represents. I used MD5 (which is relatively slow), but any hashing algo should be fine as long as you are confident there will be no collisions for path strings.

This isn't 100% thread-safe, since (unless I've missed something stupid) you can't atomically check if a file exists and create it in Node, but in my use case, I was holding locks for 10 seconds or more, so microsecond race conditions didn't seem that much of a threat. If you are holding and releasing thousands of locks per second on the same files, then this race condition might apply to you.

These will be advisory locks only, clearly, so it is up to you to ensure your code requests locks and catches the expected exceptions.

Andrew
  • 14,204
  • 15
  • 60
  • 104
  • 6
    This is almost correct but as it stands is dead wrong. A lock file is not an empty file in a directory. It is a symlink in a directory (at least on unixen like Linux, Mac etc.). This is because like you stated there are no atomic guarantees for creating, checking, reading and deleting files. There is however atomic guarantees for creating symlinks. So both the writer AND reader create the lockfile. The writer create the lockfile both to check that it is not being read and to lock it. The reader create the lockfile to check and to lock it. – slebetman Feb 25 '16 at 02:07
  • 1
    I'm looking at the `lockfile` module but I'm unclear how I would write to the file after locking it first. Seems like something `fs.writeFile` should have built in. – chovy Sep 11 '16 at 18:57
  • 1
    @Andrew, But what about other processes that don't play nice with your process and simply ignore the whole "locking procedure" before they start writing to the directory? – Pacerier Feb 28 '17 at 12:03
  • 1
    @Pacerier, your code must play nice. For me, that means assigning each fs-writing service in my application with a specific directory or path, and always implementing path-validation any time a service needs to write. `LogService` can only overwrite `AvatarService`'s locked files if it can get to `~/avatars/`, so don't let it. If you do this consistently everywhere, then you only need to implement the locking behaviour in services that need it and never worry about overwrites. If you have two services that can write to the same path and only one locks, you should anticipate trouble. – Andrew Feb 28 '17 at 14:22
  • 1
    You can write to a file with `{ flag: 'wx' }`, and if the file already exists, you will get an Error with `{ code: 'EEXIST' }` and you know that the lock could not be acquired. If I am not mistaken, this operation _is_ atomic. – Thai Aug 13 '21 at 04:39
-1

Rename file is atomic. Write the file with some specific name (e.g. extension), when write complete and file closes, rename it to some other specific name. Watch for files with that second specific name only. Or rename files into another (sub)directory. The only problem can appear, when the underlaying os exposes partially flushed closed files, not likely

Murphy
  • 19
  • 3