2

I'm writing a very simple pipeline for processing uploaded files and I've opted to keep the state of the pipeline on the file system instead of a database. The pipeline itself is very simple because files are processed in temporary locations and then atomically moved into a folder that indicates a specific stage of the pipeline is complete. Once the file is moved into a folder for the next stage the file from the previous stage is deleted. The pipeline is a single process and there is only one of it active at a time so there are no race conditions or crash scenarios to worry about. At worst we repeat work that was done previously and waste some disk space.

The problems are with the very first step of the pipeline. The server that is handling the file uploads moves files into a specific folder to make it available for clients to access and at the same time it creates a symlink in another folder to indicate to the pipeline process that there is work to be done. Since there is a race condition here I'm using file locks to create and delete the symlinks. The pipeline process locks the symlink, processes the file, deletes the symlink, and releases the lock. The upload process does the same thing except it does it for creating the symlink. This takes care of the race condition between deleting the symlink and creating it but there is still a bug here.

It is possible that the upload process will crash after moving the file into place but before creating the symlink to indicate there is a file that needs to be processed. I would like to know what is the best way to handle this case. I can in theory create a file at the beginning of the upload process and then delete it upon successful creation of the symlink to indicate successful creation of the symlink but this again leads to the same locking problems as before because there are multiple upload processes that need to coordinate between each other.

Is there a simpler way to handle the case of the crashed upload process?

David K.
  • 6,153
  • 10
  • 47
  • 78
  • Hmmm.... in unix a pipeline is always more than one process connected by pipes, so perhaps you are using the term in some other way? Also, if there's only ever one instance (running process) of the program processing the files then you don't need any locking or other special magic. Also, locking a symlink file is not the same as locking a normal file. Also, why would your upload process crash between two simple file operations (rename and symlink)? – Greg A. Woods May 28 '15 at 22:02
  • Because systems crash and fault tolerance means account for those crashes. – David K. May 29 '15 at 01:55

1 Answers1

0

The usual approach to this type of problem is to check for "stale" files.

If a lock file has a modification date more than X seconds old, you assume that the process that created it has died and delete it.

If a data file has a modification date more than X seconds old, you assume that the process that created it has died and delete it.

If you want to be really safe, and the files are not particularly large, you can make X be something ridiculous like a day (60*60*24 seconds).

Andru Luvisi
  • 24,367
  • 6
  • 53
  • 66