7

I have an array of filenames and each process need to create and write only to a single file.

This is what I came to:

foreach ($filenames as $VMidFile) {
    if (file_exists($VMidFile)) { // A
        continue;
    }

    $fp = fopen($VMidFile, 'c'); // B

    if (!flock($fp, LOCK_EX | LOCK_NB)) { // C
        continue;
    }

    if (!filesize($VMidFile)) { // D
        // write to the file;

        flock($fp, LOCK_UN);
        fclose($fp);
        break;
    }

    flock($fp, LOCK_UN);
    fclose($fp); // E
}

But I don't like that I'm relying on the filesize.

Any proposals to do it in another (better) way?

UPD: added the labels to discuss easily

UPD 2: I'm using filesize because I don't see any other reliable way to check if the current thread created the file (thus it's empty yet)

UPD 3: the solution should be condition race free.

zerkms
  • 249,484
  • 69
  • 436
  • 539

4 Answers4

4

A possible, slightly ugly solution would be to lock on a lock file and then testing if the file exists:

$lock = fopen("/tmp/".$filename."LOCK", "w"); // A

if (!flock($lock, LOCK_EX)) { // B
    continue;
}
if(!file_exists($filename)){ // C
    //File doesn't exist so we know that this thread will create it
    //Do stuff to $filename
    flock($lock, LOCK_UN); // D
    fclose($lock);
}else{
    //File exists. This thread didn't create it (at least in this iteration).
    flock($lock, LOCK_UN);
    fclose($lock);
}

This should allow exclusive access to the file and also allows deciding whether the call to fopen($VMidFile, 'c'); will create the file.

zerkms
  • 249,484
  • 69
  • 436
  • 539
Jim
  • 22,354
  • 6
  • 52
  • 80
2

Rather than creating a file and hoping that it's not interfered with:

  1. create a temporary file
  2. do all necessary file operations on it
  3. rename it to the new location if location doesn't exist.

Technically, since rename will overwrite the destination there is a chance that concurrent threads will still clash. That's very unlikely if you have:

if(!file_exists($lcoation) { rename(...

You could use md5_file to verify the file contents is correct after this block.

Hamish
  • 22,860
  • 8
  • 53
  • 67
  • "rename it to the new location if location doesn't exist." --- how would you check it in a thread-safe manner? "there is a chance" -- I don't want to rely on "chance". "That's very unlikely" --- I don't want to rely on "likely or not", I want the solution that **guarantees** that it will always work as expected. – zerkms Jan 24 '13 at 20:53
  • So do you have a proposal that is free of condition race? Now it's not an answer, sorry. – zerkms Jan 24 '13 at 21:03
  • I like this proposal; it makes the critical section tiny. – Dmytro Dec 17 '16 at 01:28
1

You can secure exclusive access using semaphores (UNIX only, and provided the sysvsem extension is installed):

$s = sem_get(ftok($filename), 'foo');
sem_acquire($s);

// Do some critical work...

sem_release($s);

Otherwise you can also use flock. It does not require any special extensions, but according to comments on PHP.net is a bit slower than using semaphores:

$a = fopen($file, 'w');
flock($a, LOCK_EX);

// Critical stuff, again

flock($a, LOCK_UN);
helmbert
  • 35,797
  • 13
  • 82
  • 95
  • If you check my question you'll see, that I'm already using `flock` – zerkms Jan 24 '13 at 21:04
  • they will lead to the same question - how to check if the file was created by the current process. Not sure I understand how I can synchronize `file_exists` with semaphore – zerkms Jan 24 '13 at 21:11
  • I'd first acquire a semaphore, and then -- when exclusive access is already ensured -- check whether the file exists; if not, the current thread/process will definitely create it. – helmbert Jan 24 '13 at 22:09
  • it's blocking solution. Indeed it will work, but it blocks everything – zerkms Jan 24 '13 at 22:23
0

Use mode 'x' instead of 'c' in your fopen call. And check the resulting $fp, if it's false, the file wasn't created by the current thread, and you should continue to the next filename.

Also, depending your PHP's installation settings, you may want to put an @ in front of the fopen call to suppress any warnings if fopen($VMidFile, 'x') is unable to create the file because it already existed.

This should work even without flock.

Rogier
  • 153
  • 5
  • What if a script dies and doesn't clean up the file? It would require a human interaction to clean the file to run again, wouldn't it? – zerkms Feb 06 '14 at 10:41
  • Not sure how this relates to the original problem, what do you mean 'clean up the file'? Delete it afterwards, or..? It can be taken care of automatically, as long as you can define the exact conditions that determine whether another thread may still be running and working on the file, or if it crashed and the file has become orphaned. – Rogier Feb 07 '14 at 03:30
  • so that's the original question: "as long as you can define the exact conditions that determine whether another thread may still be running" --- that's what I asked about a locking mechanism for. And the mechanism should be reliable so that it didn't require any additional heuristics. With your proposal I see that the algorithm will be in stuck if a process dies and doesn't remove the file. – zerkms Feb 07 '14 at 03:57
  • Well the 'x' mode takes care of all the thread-safety and race conditions, it's basically a lock mechanism by itself. When or where would it get stuck, exactly? – Rogier Feb 07 '14 at 10:30
  • 1
    the process dies and the file is left there. No process could be run again due to `x` will never "obtain a lock" – zerkms Feb 07 '14 at 10:31
  • To avoid partially processed files, you may want to introduce a 'thread status' or something, where each thread frequently updates if it's still working on some file, or successfully finished it. Then other threads can check if a particular file is still being worked on, and if it has been for more than a long amount of time, consider the thread dead and reprocess the file. But this is a different problem than the thread-exclusive creation of a new file. – Rogier Feb 07 '14 at 10:32
  • So what is the reason to use this proposal which requires another mechanisms to maintain keepalive state and wait for keepalive to expire in case if a process dies? While the other solutions are free of these issues "But this is a different problem than the thread-exclusive creation of a new file" --- the original problem wasn't about a TS creation of a new file, it was about a TS processing of something. – zerkms Feb 07 '14 at 10:34
  • The initial post was all about creating a file in a thread-safe manner, which you pulled off with locks and filesize check. I guess that works, but I think the 'x' mode is a simpler and better alternative. Yet neither solution takes care of reprocessing files that got orphaned from died threads (the filesize check doesn't either, a thread may have died half way writing the file). – Rogier Feb 07 '14 at 10:47
  • By the way, as far as I can see, none of the other solutions are free of these issues? – Rogier Feb 07 '14 at 10:50
  • The checked is, isn't it? – zerkms Feb 07 '14 at 11:43
  • No, how does that deal with (or even just detect) if some thread died prematurely, halfway writing the file? As opposed to some thread still working on it? – Rogier Feb 07 '14 at 17:25