Document DB and simulating ACID

Question

See results at the end

I want to use a document DB (for various reasons) - probably CouchDB or MongoDB. However, I also need ACID on my multiple-document transactions.

However, I do plan on working with "add-only" model - changes are added as new documents (add is add, update is add a copy+transform data, delete is add empty document with the same ID + delete flag). Periodically, I'll run compaction on the database to remove non-current documents.

With that in mind, are there any holes in the following idea:

Maintain a collection for current transactions in progress. This collection will hold documents with transaction IDs (GUIDs + timestamp) of transactions in progress.

Atomicity:
    On a transaction:
        Add a document to the transactions in progress collection.
        Add the new documents (add is add, update is copy+add, delete is add with ID and “deleted” flag).
            Each added document will have the following management fields:
            Transaction ID.
            Previous document ID (linked list).
        Remove the document added to the transactions in progress collection.
    On transaction fail:
        Remove all added documents
        Remove the document from the transactions in progress collection.
    Periodically:
        Go over all transaction in progress, get ones that have been abandoned (>10 minutes?), remove the associated documents in the DB (index on transaction ID) and then remove the transaction in progress.
Read transaction consistency (read only committed transactions):
    On data retrieval:
        Load transactions in progress set.
        Load needed documents.
        For all documents, if the document transaction ID is in “transactions in progress” or later (using timestamp), load the previous document in the linked list (recursive).

It’s a bit like MVCC, a bit like Git. I set the retrieval context by the transactions I know that managed to finish before I started. I avoid single sequence (hence single execution) by keeping a list of “ongoing transactions” and not a “transaction revision”. And, of course, I avoid reading non-comitted transactions and provide rollback on conflict.

So - are there any holes in this? Will my performance suffer horribly?

Edit1: Please please please - don't hammer the "don't use document database if you need multi-document transactions". I know, I need a document database anyway for other reasons.

Edit2: added timestamp to avoid data from transactions that start after retrieval transaction has started. Possibly could change timestamp to sequence ID.

Edit3: Here's another algorithm I thought about - it may be better than the one above:

New algorithm - easier to understand (and possible correct this time :) )

Support structures:
transaction_support_tempalte {
    _created-by-transaction: <txid>
    _made-obsolete-by-transaction: <txid>
}

transaction_record { //
    transaction_id: <txid>
    timestamp: <tx timestamp>
    updated_documents: {
        [doc1_id, doc2_id...]
    }   
}

transaction_numer { //atomic counter - used for ordering transactions.
    _id: "transaction_number"
    next_transaction_id: 0 //initial.
}

Note: all IDs are model object IDs, not DB ids (don't confuse with logical IDs which are different).
DB ID - different for each document - but multiple DB documents are revisions of one model object.
Model object ID - same for all revisions of the model object.
Logical ID - client-facing ID.


First time setup:
1. Create the transaction_number document:

Commit process:
1. Get new transaction ID by atomic increment on the transaction number counter.
2. Insert a new transaction record with the transaction id, the timestamp and the updated documents.
3. Create the new version for each document. Make sure the _created-by-transaction is set.
4. Update the old version of each updated or deleted document as 
   "_made-obsolete-by-transaction" with the transaction id.
   This is the time to detect conflicts! if seen a conflict, rollback.
   Note - this can be done as find-and-modify rather then by serializing the entire document again.
5. Remove the transaction record.

Cleanup process:
1. Go over transaction record, sorted by id, ascending (oldest transaction first).
2. For each transaction, if it expired (by timestamp), do rollback(txid).

Rollback(txid) process:
1. Get the transaction record for the given transaction id.
2. For each document id in the "updated documents":
    2.1 If the document exists and has "_made-obsolete-by-transaction" with 
        the correct transaction id, remove the _made-obsolete-by-transaction data.
3. For each document with the _created-by-transaction-id:
    3.1 remove the document.
4. Remove the transaction record document.

Retrieval process:
1. Top-transaction-id = transaction ID counter.
2. Read all transactions from the transactions collection. 
   Current-transaction-ids[] = Get all transaction IDs.
3. Retrieve documents as needed. Always use "sort by transaction_id, desc" as last sort clause.
    3.1 If a document "_created-by-transaction-id" is in the Current-transaction-ids[] 
        or is >= Top-transaction-id - ignore it (not yet committed).
    3.2 If a document "_made-obsolete-by-transaction" is not in the Current-transaction-ids[] 
        and is < Top-transaction-id - ignore it (a newer version was committed).
4. We may have to retrieve more chunks to satisfy original requests if documents were ignored.

Was the document committed when we started?
If we see a document with transaction ID in the current executing transactions - it's a transaction that started before we started the retrieval but was not yet committed at that time - so we don't want it. If we see a document with transaction ID >= top transaction ID - it's a transaction that started after we started the retrieval - so we don't want it.

Is the document up-to-date (latest version)?
If we see a document with made-obsolete that is not in the current transaction IDs (transactions started before we started) and is < top transaction ID (transactions started after we started) - then there was a transaction that finished commit in our past that made this document obsolete - so we don't want it.

Why is sorting not harmed?
Because we add the sort as a last clause, we'll always see the real sorting work first. For each real sorting "bucket" we might get multiple documents that represent the model object at different versions. However, the sort order between model objects remains.

Why doesn't the counter makes the transaction execute serially (one at ta time)?
Because this is not RDBMS - we don't really have transactions so we don't wait for the transaction to commit as we do with "select for update". Another transaction can make the atomic change as soon as we're done with it.

Compaction:
One in a while a compaction will have to take place - get all really old documents and remove them to another data store. This shouldn't affect any running retrieval or transaction.

Optimization:

Put the conditions into the query itself.
Add transaction ID to all indexes.
Make sure documents with the same model object ID don't get sharded into different nodes.

What's the cost?
Assuming we want multiple document versions for history and audit anyway, the extra cost is atomically updating the counter, creating the transaction record, "sealing" the previous version of each model object (mark obsolete) and removing the transaction document. This shouldn't be too big. Note that if the above assumption is not valid, the extra cost is quite high, especially for retrieval.

Results:

I've implemented the above algorithm (the revised one with minor changes). Functionally, it's working. However, the performance (at least over MongoDB with 3 nodes in master-slave replication topology, no fsync but replication required before "commit" ends) is atrocious. I'm constantly reading things I've just written to from different threads. I'm getting constant collection locks on the transactions collection and my indexes can't keep up with the constant rollover. Performance is capped at 20 TPS for tiny tiny transactions with 10 feeder threads.

In short - not a good general purpose solution.

btw, marking each transaction record as "finished" instead of deleting it would create a transaction log. — Ran Biron, Sep 16 '12 at 10:45
Have you gone on working on this? I'd be interested in any follow-up you might have. — Jean-Philippe Pellet, Sep 23 '12 at 17:46
@Jean-PhilippePellet - will start working on this in the next few weeks (hopefully). I'll try to remember to update with results. — Ran Biron, Sep 24 '12 at 12:21
@Jean-PhilippePellet added results (bad ones). I've given up on this approach. — Ran Biron, Nov 23 '12 at 19:31
@RanBiron have you checked this algo: https://github.com/rystsov/mongodb-transaction-example — assylias, May 06 '13 at 23:15
This looks very promising too: http://www.tokutek.com/2013/04/mongodb-transactions-yes/ — assylias, May 06 '13 at 23:36

score 2 · Answer 1 · edited May 23 '17 at 12:27

without going into the specifics of your plan, I thought it might first be useful to go over mongoDB's support of ACID requirements.

Atomicity: Mongo supports atomic changes for individual documents. Typically, the most significant atomic operations are "$set" and findAndModify Some documentation on these operations and atomicity in mongoDB in general:

http://www.mongodb.org/display/DOCS/Atomic+Operations
[http://www.mongodb.org/display/DOCS/Updating#Updating-%24set][1]
http://www.mongodb.org/display/DOCS/findAndModify+Command

Consistency: Difficult to achieve and quite complex. I won't try to summarize in this post, but there is a great series of posts on the subject:

http://blog.mongodb.org/post/475279604/on-distributed-consistency-part-1
[http://blog.mongodb.org/post/498145601/on-distributed-consistency-part-2-some-eventual][2]

Isolation: Isolation in mongoDB does exist for documents, but not for any higher levels. Again, this is a complicated subject; besides the Atomic Operations link above, the best resource I have found is the following stack overflow thread:

Why doesn't MongoDB use fsync()? (the top answer is a bit of a goldmine for this subject in general, though some of the information regarding durability is out of date)

Durability: The main way that users ensure data durability is by using the getLastError command (see link below for more info) to confirm that a majority of nodes in a replica set have written the data before the call returns.

http://www.mongodb.org/display/DOCS/getLastError+Command#getLastErrorCommand-majority 
http://docs.mongodb.org/manual/core/replication-internals/ (linked to in the above document)

Knowing all this about ACID in mongo, it would be very useful to look over some examples similar problems that have already been worked out in mongo. The two following links I expect will be really useful to you as they are very complete and right on subject.

Two-Phase Commits: http://cookbook.mongodb.org/patterns/perform-two-phase-commits/

Transactions for e-commerce work: http://www.slideshare.net/spf13/mongodb-ecommerce-and-transactions-10524960

Finally, I have to ask: Why do you want to have transactions? It is rare that users of mongoDB find they truly need ACID to achieve their goals. It might be worthwhile stepping back and trying to approach the problem from another perspective before you go ahead and implement a whole layer on top of mongo just to get transactions.

Nice research work. I read most of these before, but still, very nice work. As to why transactions? The business model can't really suffer a data loss but, maybe worse, commit read-consistency is important as sometimes multiple related documents can be written at once and it's important not to read an uneven commit. So why not graph database? Scale. I might still go there if I see that my "cleverness" causes the document store to behave even worse, but graph databases just don't cut it today. I really hoped they do at the future. — Ran Biron, Sep 14 '12 at 13:55

Document DB and simulating ACID

New algorithm - easier to understand (and possible correct this time :) )

1 Answers1