1

I would like to prevent a script from launching twice by using a PID file. There are many ways to implement exclusivity, but since my script will always run on a Linux machine and I would like to be able to detect stale PID files automatically, I would like to use flock(2) to implement this.

I was told long ago by a colleague that the following pseudocode is the right way to do this (open(..., 'w') means "open in write mode with O_CREAT"):

fd = open(lockfile, 'w');
write(fd, pid);
close(fd);
fd = open(lockfile);
flock(fd)
file_pid = read(fd)
if file_pid != pid:
    exit(1)
// do things

I am curious why he suggested the above instead of:

fd = open(lockfile, 'w')
flock(fd)
// do things

Presumably he suggested this because he thought the "create file if it doesn't exist" functionality of open(2) with O_CREAT is not atomic, that is, two processes who call open(2) at exactly the same time might get handles to two different files because the file creation is not exclusive.

My question is, is the latter code always correct on a Linux system, or if not, when is it not correct?

  • I'm moderately sure the pseudo-code is wrong. All else apart, `flock()` locking is normally (almost always) supervisory, not mandatory. This code appears to write the new process's PID in the file even though the old process could still be using it. This alone causes grief. I've not found a good 'how to do locking' Q&A here on SO, which is a bit surprising, though I only looked for 'lockfile' so I can't be said to have searched thoroughly yet. – Jonathan Leffler May 01 '15 at 22:03
  • As long as you don't expect the file to contain the exact pid of the process that currently owns the lock, the code should be right. I don't know what you mean by "supervisory lock", did you mean "advisory lock?" If that's what you meant, it still shouldn't matter as long as both processes flock(). What is the scenario where the wrong thing will happen (two processes will get to line 9 of the pseudo-code concurrently)? – Patrick Krecker Jul 30 '15 at 21:44
  • For my 'supervisory', use 'advisory' — sorry about the wrong terminology. – Jonathan Leffler Jul 30 '15 at 21:46
  • The whole point of the lock file is to contain the PID of the process that has the file locked, so that readers of the lock file can check whether the process still exists, and take over the lock if it does not. So, having the wrong PID in the file is traumatically wrong. – Jonathan Leffler Jul 30 '15 at 22:00
  • It seems that Wikipedia on [file locking with lock files](https://en.wikipedia.org/wiki/File_locking#Lock_files) would claim I'm over-stating my case. However, that section is a little on the flimsy side (it doesn't give pseudo-code algorithms, for example), and there are bound to be many ways of doing locking with lock files — not necessarily all as effective or resilient as each other. – Jonathan Leffler Jul 30 '15 at 22:14

1 Answers1

2

flock is not 100% reliable: http://en.wikipedia.org/wiki/File_locking#Problems

The 1st recipe is rather intrusive in the sense that a subsequent invocation of the process could blindly overwrite the pid data written by the previous invocation effectively preventing the 1st process from running. At high repeated invocation rates it's thus possible for none of the processes to run.

To ensure file creation exclusivity use O_CREAT | O_EXCL. You'd need to handle untimely process death leaving the file behind, tho.

I'd suggest 2 files:

  • a lock file opened with O_CREAT | O_EXCL, used just for protecting the actual PID file operations, should exist for just very short periods of time, easy to decide if stale based on creation time.
  • the actual PID file

Each process waits for the lock file to disappear (cleans it when it becomes stale), then attempts to create the lock file (only one instance succeeds, the others wait), checks the PID file existence/content (cleans up and deletes it if stale), creates a new PID file if it decides to run, then deletes the lock file and runs/exits as decided.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • Thanks for the answer, this is a great solution. Do you know the answer to the part about open with O_CREAT but without O_EXCL? As in, assuming flock is reliable, does the second pseudocode snippet work? – Patrick Krecker May 01 '15 at 21:35
  • At best it might be able to just serialize the execution of multiple instances, but not prevent subsequent executions which without O_EXCL would block at flock and continue whenever the 1st process exits for whatever reason. IMHO very difficult to implement a reliable piece of logic. – Dan Cornilescu May 03 '15 at 19:29
  • Your solution is similar to perl `File::NFSLock`, I use it for writing sometime, does it invalid NFS cache at the same time? – Gang Jan 29 '16 at 23:50
  • I'm not 100% sure if attempting to read some other file than the one being created won't still use the cache. The cached directory entry *might* be refreshed by creating a file in that directory (at least my observations seem to support that). It may also depend on the actual client and/or server NFS implementations. – Dan Cornilescu Jan 30 '16 at 03:52