Are transactions on top of "normal file system" possible?

Question

It seems to be possible to implement transactions on top of normal file systems using techniques like write-ahead logging, two-phase commit, and shadow-paging etc.

Indeed, it must have been possible because a transactional database engine like InnoDB can be deployed on top of a normal file system. There are also libraries like XADisk.

However, Apache Commons Transaction state:

...we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. ...

Why did Apache Commons Transactions claim implementing transactions on top of normal file systems is impossible?

Is it impossible to do transactions on top of normal file systems?

score 0 · Answer 1 · edited Nov 25 '14 at 06:56

This answer is pure speculation, but you may be comparing apples and oranges. Or perhaps more accurately, milk and dairy products.

When a database uses a file system, it is only using a small handful of predefined files on the system (per database). These include data files and log files. The one operation that is absolutely necessary for ACID-compliant transactions is the ability to force a write to permanent memory (either disk or static RAM). And, I think most file systems provide this capability.

With this mechanism, the database can maintain locks on objects in the database as well as control access to all objects. Happily, the database has layers of memory/page management built on top of the file system. The "database" itself is written in terms of things like pages, tables, and indexes, not files, directories, and disk blocks.

A more generic transactional system has other challenges. It would need, for instance, atomic actions for more things. E.g. if you "transactionally" delete 10 files, all these would have to disappear at the same time. I don't think "traditional" file systems have this capability.

In the database world, the equivalent would be deleting 10 tables. Well, you essentially create new versions of the system tables without the tables — within a transaction, while the old tables are being used. Then you put a full lock on the system tables (preventing reads and writes), waiting until they are available. Then you swap in the new table definitions (i.e. without the tables), unlock the tables, and clean up the data. (This is intended as an intuitive view of the locking mechanism in this case, not a 100% accurate description.)

So, notice that locking and transactions are deeply embedded in the actions the database is doing. I suspect that the authors of this module come to realize that they had to basically fully re-implement all existing file system functionality to support their transactions — and that was a bit too much scope to take on.

The project stated clearly that it's impossible not due to scope and project size problems, but simply because there is **no way** to do it. As for your example, since the requirement is for all calls to go through the API, the 10 files do not need to be deleted at the exact same time. The library only needs to ensure that calls through the API sees the action as an atomic unit. E.g. if the OS crash in the middle of the operation, then when the program restarts, the library would either run code to finish the 10-file deletion or restore the halfway deleted files from it's log. — Pacerier, Nov 23 '14 at 17:44
@Pacerier . . . The OP's question is why ACID-compliant databases can do this where this project says that it is impossible. — Gordon Linoff, Nov 23 '14 at 20:31
Yes my comment is trying to say that it **is** possible, so why does the project state the opposite? — Pacerier, Nov 23 '14 at 21:17
And my answer is trying to say that databases are doing something simpler than a general system, only requiring limited functionality from the underlying file system to build the constructs needed for the database. — Gordon Linoff, Nov 23 '14 at 21:22

score -1 · Answer 2 · answered Nov 24 '14 at 06:26

-1

Windows offers transactions on top of NTFS. See the description here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb968806%28v=vs.85%29.aspx

It's not recommended for use at the moment and there's an extensive discussion of alternative scenarios right in MSDN: http://msdn.microsoft.com/en-us/library/windows/desktop/hh802690%28v=vs.85%29.aspx .

Also if you take a definition of the filesystem, DBMS is also a kind of a filesystem and a filesystem (like NTFS or ext3) can be implemented on top (or in) DBMS as well. So Apache's statement is a bit, hmm, incorrect.

answered Nov 24 '14 at 06:26

Eugene Mayevski 'Callback

45,135
8
71
121

Is it possible then to implement transactions on top of normal file systems? – Pacerier Nov 25 '14 at 06:50
@Pacerier what do you call "normal"? Take a generic filesystem and use it in a transacted way? On Windows I can imagine some filesystem filter driver that would be able to do this for most filesystem accesses (though filters can be bypassed and such scheme wouldn't be very reliable). On other platforms most likely kernel patching would be required. – Eugene Mayevski 'Callback Nov 25 '14 at 16:23
Yes, Is it possible then to implement transactions on top of a generic file system that doesn't have its own built-in mechanism? – Pacerier Nov 27 '14 at 20:26
@Pacerier For transaction to work there must be a single point of entry for all filesystem requests which (point) you have to control. For the filesystem the driver code is that point of entry. So you would have to patch the filesystem driver. – Eugene Mayevski 'Callback Nov 28 '14 at 03:27

Are transactions on top of "normal file system" possible?

2 Answers2