How to ensure file integrity on file write failures?

Question

Follow up to: How to safely update a file that has many readers and one writer?

In my previous questions, I figured out that you can use FileChannel's lock to ensure an ordering on reads and writes.

But how do you handle the case if the writer fails mid-write (say the JVM crashes)? This basic algorithm would look like,

WRITER:
  lock file
  write file
  release file

READER:
  lock file
  read file
  release file

If the JVM crashes during write file, sure the lock would be released, but now I have an incomplete file. I want something complete to always be readable. Either the old content the new content and nothing in between.

My first strategy was to write to a temporary file, and then copy the contents into the "live" file (while ensure good locking). The algorithm for this is,

WRITER:
  lock temp file
  write temp file
  lock file
  copy temp to file
  release file
  release temp
  delete temp

READER:
  lock file
  read file
  release file

One nice thing is the delete temp won't delete the temp if it has already been locked by another writer.

But that algorithm doesn't handle if the JVM crashes during copy temp to file. So then I added a copying flag,

WRITER:
  lock temp file
  write temp file
  lock file
  create copying flag
  copy temp to file
  delete copying flag
  release file
  release temp
  delete temp

READER:
  lock file
  if copying flag exists
    copy temp to file
    delete copying flag
    delete temp 
  end
  read file
  release file

There won't ever be two things accessing the copying file as it is guarded by the file lock.

Now, is this the way to do it? It seems very complicated to ensure something very simple. Is there some Java library that handles this for me?

EDIT

Well, I managed I make a mistake in my third attempt. The reader doesn't hold the lock to temp when it does copy temp to file. Also its not a simple fix to simply lock the temp file! That would cause the writer and reader to acquire locks in different orders and can lead to deadlock. This is getting more complicated all the time. Here's my fourth attempt,

WRITER:
  lock file
  write temp file
  create copying flag
  copy temp to file
  delete copying flag
  delete temp
  release file

READER:
  lock file
  if copying flag exists
    copy temp to file
    delete copying flag
    delete temp 
  end
  read file
  release file

This time the temp file is guarded by main lock, so it doesn't even need its own lock.

EDIT 2

When I say JVM crash, I actually mean say the power went out and you didn't have a UPS.

EDIT 3

I still managed to make another mistake. You shouldn't lock on the file you are writing to or reading from. This will cause problems, since you can't get both the read and write lock unless you use RandomAccessFile in Java, which does not implement Input/Output stream.

What you want to do instead is just lock on a lock file that guards the file you are read or writing. Here's the updated algorithm:

WRITER:
  lock
  write temp file
  create copying flag
  copy temp to file
  delete copying flag
  delete temp
  release

READER:
  lock
  if copying flag exists
    copy temp to file
    delete copying flag
    delete temp 
  end
  read file
  release

lock and release guards the file, the temp file and the copying flag. The only problem is now the reader lock can't be shared, but it never could be really. The reader always had a chance to modify the file, therefore it would have been wrong to make a shareable lock in the first place.

score 1 · Accepted Answer · answered Apr 04 '11 at 17:26

despite the fact that there is no bullet-proof, cross-OS, cross-FS solution, the "write to unique temp file and rename" strategy is still your best option. most platforms/filesystems attempt to make file renaming (effectively) atomic. note, you want to use a +separate+ lock file for locking.

so, assuming you want to update "myfile.txt":

lock "myfile.txt.lck" (this is a separate, empty file)
write changes to a unique temp file, e.g. "myfile.txt.13424.tmp" (use File.createTempFile())
for extra protection, but possibly slower, fsync the temp file before proceeding (FileChannel.force(true)).
rename "myfile.txt.13424.tmp" to "myfile.txt"
unlock "myfile.txt.lck"

on certain platforms (windows), you need to do a little more dancing due restrictions on file ops (you can move "myfile.txt" to "myfile.txt.old" before renaming, and use the ".old" file to recover from if you need to when reading).

score 0 · Answer 2 · answered Jan 20 '09 at 19:39

0

I don't think there is a perfect answer. I don't exactly know what you need to do, but can you do the writing to a new file, and then on success, rename files, rather than copying. Renaming is quick, and hence should be less prone a crash. This still won't help if it fails at the rename stage, but you've minimized the window of risk.

Again, I'm not sure if it is applicable or relevant to your needs, but can you write some file block at the end of the file to show all the data has been written?

answered Jan 20 '09 at 19:39

Miles D

7,960
5
34
35

From my understanding rename isn't safe. And on windows the dest file can't exist if for the renameTo to work. Therefore if you delete the dest file and then the jvm fails, you don't have a readable file. – Pyrolistical Jan 20 '09 at 19:46
@Pyrolistical: If the destination file is deleted and then the JVM fails, you do have a readable file. It won't have the right name, but that can easily enough be corrected by the code that opens the file. – supercat Apr 04 '11 at 16:08

Eduard Wirch · Answer 3 · 2011-04-11T09:21:16.877

0

I assume you have a large file which you are continously appending to. Crashes of the VM do not happen very often. But if they occur, you need a way to roll back the failed changes. You just need a way to know how far to roll back. For example by writing the last file length to a new file:

WRITER:
  lock file
  write file position to pos-file
  write file
  remove pos-file
  unlock file

If the writer crashes one of your readers will get the read lock. They have to check for the pos-file. If they find one a crash occurred. If they look inside the file they know how far to roll back the changes the get a consistent file again. Of course the roll back procedure has to happen in a similar way like the write procedure.

When you are not appending but replacing the file, you can use the same method:

WRITER:
  lock file
  write writing-in-progress-file
  write file
  remove writing-in-progress-file
  unlock file

Same rules as previously apply for the reader. When the writing-in-progress-file exists but the reader already acquired the read lock the written file is in a inconsistent state.

edited Apr 11 '11 at 09:21

answered Jan 20 '09 at 22:09

Eduard Wirch

9,785
9
61
73

Yes this is good for appending. While I am doing appending most of the time, sometimes I would replace the contents of the entire file. – Pyrolistical Jan 20 '09 at 22:26
What if the computer fails half way through write writing-in-progress-file? How does the reader know if it should read writing-in-progress-file or not? – Pyrolistical Jan 21 '09 at 18:50
Writing-in-progress-file is just a flag. It is an empty file. There is nothing to read. It just tells the reader that a write process has begun. – Eduard Wirch Jan 22 '09 at 13:47
Yes, so if your computer fails during the write, the next reader that comes along will fail. I want to ensure readability. – Pyrolistical Jan 27 '09 at 00:44
The next reader will not fail, because it knows if there is a writing-in-progress-file it has to recover from the last failed write. – Eduard Wirch Jan 27 '09 at 11:36
@Eduard - you've just moved the problem. what if the crash happens while writing the "writing-in-progress-file"? – jtahlborn Apr 04 '11 at 17:12
@jtahlborn: It can not since you are not writing any contents. You're just creating a file. Since the file system will take care of the consistency of itself there are only two cases: file exists or file does not exist. – Eduard Wirch Apr 11 '11 at 09:26

supercat · Answer 4 · 2011-04-04T16:32:48.243

Without some operating-system support, it's not going to be possible to have arbitrarily-complex file operations be atomic without requiring programs which open a file to either complete or roll back operations that may have been in progress. If that caveat is acceptable, one could do something like the following:

Near the start of a file, not straddling a 512-byte boundary, include two 4- or 8-byte numbers (depending upon maximum file size), indicating the logical length of the file and the location of an update record (if any). Most of the time the update-record value will be zero; the act of writing a non-zero value will commit an update sequence; it will be rewritten with zero when the update sequence is complete.
Before an update sequence is begun, determine an upper bound for the logical length of the file when the update sequence is complete (there are ways of getting around this limitation, but they add complexity)
To start an update sequence, seek into the file a distance which is the larger of its present logical length or the (upper bound of the) future logical length, and then write update records, each consisting of a file offset, a number of bytes to write, and the data to be written. Write as many such records as are required to perform all updates, and end with a record that has 0 offset and 0 length.
To commit an update sequence, flush all pending writes and wait for their completion, and then write the new file length and the location of the first update record.
Finally, update the file by processing all the update records in sequence, flushing all writes and awaiting their completion, and setting the update-record location to zero, and (optionally) truncating the file to its logical length.
If an attempt is made to open a file where the update-sequence location is non-zero, finish performing any pending writes (using the last step described above) before doing anything else with it.

If the original operation that writes the file fails before the update-record location is written, all of the write operations will be effectively ignored. If it fails after the update-record location is written but before it is cleared, the next time the file is open all of the write operations will be committed (some of them may have already been performed, but performing them again should be harmless). If it fails after the update-record location is written, the file update will be complete and the failure won't affect it at all.

Some other approaches use a separate file to hold pending writes. In some cases, that may be a good idea, but it has the disadvantage of splitting a file into two parts, both of which must be kept together. Copying just one file, or accidentally pairing copies of the two files which were made at different times, could result in data loss or corruption.

score -1 · Answer 5 · answered May 18 '11 at 13:39

-1

Solution: use 2 files and write to them in a consecutive order.
Only 1 write can fail.

answered May 18 '11 at 13:39

Vali

1

1

Please provide an example of this solution. There is not enough information in this solution to understand it. – Jay Elston May 19 '11 at 08:09

How to ensure file integrity on file write failures?

5 Answers5