1

This question arise because when someone wants to use flat file as database, most people will say "is database not an option?" and things like that. This makes me think that most people believe that popular database softwares are reliable in handling data storage.

However, since database engines also write their data stores to files (or allow me to say "flat files"), then I am confused as to why most people believe that protection from data loss is almost completely guaranteed in database engines.

I suppose that database softwares use features like the Windows' CreateFile() function with the FILE_FLAG_WRITE_THROUGH option set; yet, Microsoft specifies in their Documentation that "Not all hard disk hardware supports this write-through capability."

Then why can a database engine be more reliable than my C# code that also uses unmanaged CreateFile() function to write to disk directly using some algorithms (like this SO way) to prevent damage to data? Especially when writing small bits of files and appending small bytes to it in some future time? (Note: not comparing in terms of robustness, features, etc... just reliability of data integrity)

Community
  • 1
  • 1
Damilola Olowookere
  • 2,253
  • 2
  • 23
  • 33
  • 4
    I think the difference is having teams of experts dedicated to the task of a creating database system (spending millions of hours and dollars to make it work the very best it can) as opposed to individual developers who think they can mimic the same functionality with 10 lines of code. Just my thoughts – Brendan Dec 18 '14 at 08:22
  • You can flush data from buffers on demand with `FlushFileBuffers` – ta.speot.is Dec 18 '14 at 08:34

2 Answers2

1

The key to most database systems integrity is the log file.

As well as updating the various tables/data stores/documents they also write all operations and associated data to a log file.

In most cases when the program "commits()" it waits until all operations are written (really written!) to the log file. If anything happens after that database can be rebuilt using the log file data.

Note -- you could get something similar using standard disk I/O and calling flush() at the appropriate points. However you could never guarantee the status of the file (many I/Os could have taken place before you called flush() ) and you could never recover to a point in time as you have no copy of deleted records or a copy of a previous version of an updated record.

James Anderson
  • 27,109
  • 7
  • 50
  • 78
  • It should also be noted that this level of integrity can be applied to files if you have a transactional framework such as CICS or Tuxedo. – James Anderson Dec 18 '14 at 08:35
  • the log files of most database servers (at least I know of SQL server)[https://disruptivesql.wordpress.com/2012/05/08/sata-and-write-through/] use the WRITE_THROUGH flag. So will you say that FlushFileBuffers is the only guaranteed way? – Damilola Olowookere Dec 18 '14 at 12:39
  • My current implementation is that I simply create myFileHandle, and then create myFileHandleCopy (If myFileHandle is not empty, I read its content into myFileHandleCopy and FlushFileBuffers() it. Else, myFileHandleCopy is empty). Then I write whatever I want to write to myFileHandle and FlushFileBuffers() it. I complete the operation by deleting myFileHandleCopy. Therefore presence of myFileHandleCopy indicates a previous incomplete I/O operation (while also doubling as a backup file!) – Damilola Olowookere Dec 20 '14 at 19:17
0

Of course you can write a very secure piece of C# code that handles all possible exceptions and faults, that calculates hash codes and check them back for anything it is going to write on the disk, that manages all quirks of every operating system it's deployed on with respect with file caching, disk write buffering and so forth and so on.

The question is: why should you?

Admittedly, a DB is not always the right choice if you just want to write data on the disk. But if you want to store data consistently, safely and most importantly, without losing too much of your time in nitty-gritty IO-operation details, then you should use some kind of well established and tested piece of code that someone else wrote and took the time to debug (hint: a database is a good choice).

See?

Also, there are databases, like sqlite, that are perfect for fast, installation-less use in a program. Use them or not, it's your choice, but I wouldn't spend my time to reinvent the wheel, if I were you.

A. Rama
  • 903
  • 8
  • 18
  • I would have marked this as the answer because it provide some very basic but pertinent information regarding the subject matter, but you have concentrated on why DB should be preferred to generically storing data to files. It is always a good idea to focus on the question than give counter-suggestions – Damilola Olowookere Dec 18 '14 at 12:36