See results at the end
I want to use a document DB (for various reasons) - probably CouchDB or MongoDB. However, I also need ACID on my multiple-document transactions.
However, I do plan on working with "add-only" model - changes are added as new documents (add is add, update is add a copy+transform data, delete is add empty document with the same ID + delete flag). Periodically, I'll run compaction on the database to remove non-current documents.
With that in mind, are there any holes in the following idea:
Maintain a collection for current transactions in progress. This collection will hold documents with transaction IDs (GUIDs + timestamp) of transactions in progress.
Atomicity:
On a transaction:
Add a document to the transactions in progress collection.
Add the new documents (add is add, update is copy+add, delete is add with ID and “deleted” flag).
Each added document will have the following management fields:
Transaction ID.
Previous document ID (linked list).
Remove the document added to the transactions in progress collection.
On transaction fail:
Remove all added documents
Remove the document from the transactions in progress collection.
Periodically:
Go over all transaction in progress, get ones that have been abandoned (>10 minutes?), remove the associated documents in the DB (index on transaction ID) and then remove the transaction in progress.
Read transaction consistency (read only committed transactions):
On data retrieval:
Load transactions in progress set.
Load needed documents.
For all documents, if the document transaction ID is in “transactions in progress” or later (using timestamp), load the previous document in the linked list (recursive).
It’s a bit like MVCC, a bit like Git. I set the retrieval context by the transactions I know that managed to finish before I started. I avoid single sequence (hence single execution) by keeping a list of “ongoing transactions” and not a “transaction revision”. And, of course, I avoid reading non-comitted transactions and provide rollback on conflict.
So - are there any holes in this? Will my performance suffer horribly?
Edit1: Please please please - don't hammer the "don't use document database if you need multi-document transactions". I know, I need a document database anyway for other reasons.
Edit2: added timestamp to avoid data from transactions that start after retrieval transaction has started. Possibly could change timestamp to sequence ID.
Edit3: Here's another algorithm I thought about - it may be better than the one above:
New algorithm - easier to understand (and possible correct this time :) )
Support structures:
transaction_support_tempalte {
_created-by-transaction: <txid>
_made-obsolete-by-transaction: <txid>
}
transaction_record { //
transaction_id: <txid>
timestamp: <tx timestamp>
updated_documents: {
[doc1_id, doc2_id...]
}
}
transaction_numer { //atomic counter - used for ordering transactions.
_id: "transaction_number"
next_transaction_id: 0 //initial.
}
Note: all IDs are model object IDs, not DB ids (don't confuse with logical IDs which are different).
DB ID - different for each document - but multiple DB documents are revisions of one model object.
Model object ID - same for all revisions of the model object.
Logical ID - client-facing ID.
First time setup:
1. Create the transaction_number document:
Commit process:
1. Get new transaction ID by atomic increment on the transaction number counter.
2. Insert a new transaction record with the transaction id, the timestamp and the updated documents.
3. Create the new version for each document. Make sure the _created-by-transaction is set.
4. Update the old version of each updated or deleted document as
"_made-obsolete-by-transaction" with the transaction id.
This is the time to detect conflicts! if seen a conflict, rollback.
Note - this can be done as find-and-modify rather then by serializing the entire document again.
5. Remove the transaction record.
Cleanup process:
1. Go over transaction record, sorted by id, ascending (oldest transaction first).
2. For each transaction, if it expired (by timestamp), do rollback(txid).
Rollback(txid) process:
1. Get the transaction record for the given transaction id.
2. For each document id in the "updated documents":
2.1 If the document exists and has "_made-obsolete-by-transaction" with
the correct transaction id, remove the _made-obsolete-by-transaction data.
3. For each document with the _created-by-transaction-id:
3.1 remove the document.
4. Remove the transaction record document.
Retrieval process:
1. Top-transaction-id = transaction ID counter.
2. Read all transactions from the transactions collection.
Current-transaction-ids[] = Get all transaction IDs.
3. Retrieve documents as needed. Always use "sort by transaction_id, desc" as last sort clause.
3.1 If a document "_created-by-transaction-id" is in the Current-transaction-ids[]
or is >= Top-transaction-id - ignore it (not yet committed).
3.2 If a document "_made-obsolete-by-transaction" is not in the Current-transaction-ids[]
and is < Top-transaction-id - ignore it (a newer version was committed).
4. We may have to retrieve more chunks to satisfy original requests if documents were ignored.
Was the document committed when we started?
If we see a document with transaction ID in the current executing transactions - it's a transaction that
started before we started the retrieval but was not yet committed at that time - so we don't want it.
If we see a document with transaction ID >= top transaction ID - it's a transaction that started after
we started the retrieval - so we don't want it.
Is the document up-to-date (latest version)?
If we see a document with made-obsolete that is not in the current transaction IDs (transactions started
before we started) and is < top transaction ID (transactions started after we started) - then
there was a transaction that finished commit in our past that made this document obsolete - so we don't want it.
Why is sorting not harmed?
Because we add the sort as a last clause, we'll always see the real sorting work first. For each real
sorting "bucket" we might get multiple documents that represent the model object at different versions.
However, the sort order between model objects remains.
Why doesn't the counter makes the transaction execute serially (one at ta time)?
Because this is not RDBMS - we don't really have transactions so we don't wait for the transaction
to commit as we do with "select for update".
Another transaction can make the atomic change as soon as we're done with it.
Compaction:
One in a while a compaction will have to take place - get all really old documents and remove them to another data store.
This shouldn't affect any running retrieval or transaction.
Optimization:
- Put the conditions into the query itself.
- Add transaction ID to all indexes.
- Make sure documents with the same model object ID don't get sharded into different nodes.
What's the cost?
Assuming we want multiple document versions for history and audit anyway, the extra cost is
atomically updating the counter, creating the transaction record, "sealing" the previous version of each model object
(mark obsolete) and removing the transaction document. This shouldn't be too big.
Note that if the above assumption is not valid, the extra cost is quite high, especially for retrieval.
Results:
I've implemented the above algorithm (the revised one with minor changes). Functionally, it's working. However, the performance (at least over MongoDB with 3 nodes in master-slave replication topology, no fsync but replication required before "commit" ends) is atrocious. I'm constantly reading things I've just written to from different threads. I'm getting constant collection locks on the transactions collection and my indexes can't keep up with the constant rollover. Performance is capped at 20 TPS for tiny tiny transactions with 10 feeder threads.
In short - not a good general purpose solution.