To provide a concrete example (because there is a surprising lack of correct examples online): here's how to implement an "atomic bank balance transfer" in CouchDB (largely copied from my blog post on the same subject: http://blog.codekills.net/2014/03/13/atomic-bank-balance-transfer-with-couchdb/)
First, a brief recap of the problem: how can a banking system which allows
money to be transfered between accounts be designed so that there are no race
conditions which might leave invalid or nonsensical balances?
There are a few parts to this problem:
First: the transaction log. Instead of storing an account's balance in a single
record or document — {"account": "Dave", "balance": 100}
— the account's
balance is calculated by summing up all the credits and debits to that account.
These credits and debits are stored in a transaction log, which might look
something like this:
{"from": "Dave", "to": "Alex", "amount": 50}
{"from": "Alex", "to": "Jane", "amount": 25}
And the CouchDB map-reduce functions to calculate the balance could look
something like this:
POST /transactions/balances
{
"map": function(txn) {
emit(txn.from, txn.amount * -1);
emit(txn.to, txn.amount);
},
"reduce": function(keys, values) {
return sum(values);
}
}
For completeness, here is the list of balances:
GET /transactions/balances
{
"rows": [
{
"key" : "Alex",
"value" : 25
},
{
"key" : "Dave",
"value" : -50
},
{
"key" : "Jane",
"value" : 25
}
],
...
}
But this leaves the obvious question: how are errors handled? What happens if
someone tries to make a transfer larger than their balance?
With CouchDB (and similar databases) this sort of business logic and error
handling must be implemented at the application level. Naively, such a function
might look like this:
def transfer(from_acct, to_acct, amount):
txn_id = db.post("transactions", {"from": from_acct, "to": to_acct, "amount": amount})
if db.get("transactions/balances") < 0:
db.delete("transactions/" + txn_id)
raise InsufficientFunds()
But notice that if the application crashes between inserting the transaction
and checking the updated balances the database will be left in an inconsistent
state: the sender may be left with a negative balance, and the recipient with
money that didn't previously exist:
// Initial balances: Alex: 25, Jane: 25
db.post("transactions", {"from": "Alex", "To": "Jane", "amount": 50}
// Current balances: Alex: -25, Jane: 75
How can this be fixed?
To make sure the system is never in an inconsistent state, two pieces of
information need to be added to each transaction:
The time the transaction was created (to ensure that there is a strict
total ordering of transactions), and
A status — whether or not the transaction was successful.
There will also need to be two views — one which returns an account's available
balance (ie, the sum of all the "successful" transactions), and another which
returns the oldest "pending" transaction:
POST /transactions/balance-available
{
"map": function(txn) {
if (txn.status == "successful") {
emit(txn.from, txn.amount * -1);
emit(txn.to, txn.amount);
}
},
"reduce": function(keys, values) {
return sum(values);
}
}
POST /transactions/oldest-pending
{
"map": function(txn) {
if (txn.status == "pending") {
emit(txn._id, txn);
}
},
"reduce": function(keys, values) {
var oldest = values[0];
values.forEach(function(txn) {
if (txn.timestamp < oldest) {
oldest = txn;
}
});
return oldest;
}
}
List of transfers might now look something like this:
{"from": "Alex", "to": "Dave", "amount": 100, "timestamp": 50, "status": "successful"}
{"from": "Dave", "to": "Jane", "amount": 200, "timestamp": 60, "status": "pending"}
Next, the application will need to have a function which can resolve
transactions by checking each pending transaction in order to verify that it is
valid, then updating its status from "pending" to either "successful" or
"rejected":
def resolve_transactions(target_timestamp):
""" Resolves all transactions up to and including the transaction
with timestamp `target_timestamp`. """
while True:
# Get the oldest transaction which is still pending
txn = db.get("transactions/oldest-pending")
if txn.timestamp > target_timestamp:
# Stop once all of the transactions up until the one we're
# interested in have been resolved.
break
# Then check to see if that transaction is valid
if db.get("transactions/available-balance", id=txn.from) >= txn.amount:
status = "successful"
else:
status = "rejected"
# Then update the status of that transaction. Note that CouchDB
# will check the "_rev" field, only performing the update if the
# transaction hasn't already been updated.
txn.status = status
couch.put(txn)
Finally, the application code for correctly performing a transfer:
def transfer(from_acct, to_acct, amount):
timestamp = time.time()
txn = db.post("transactions", {
"from": from_acct,
"to": to_acct,
"amount": amount,
"status": "pending",
"timestamp": timestamp,
})
resolve_transactions(timestamp)
txn = couch.get("transactions/" + txn._id)
if txn_status == "rejected":
raise InsufficientFunds()
A couple of notes:
For the sake of brevity, this specific implementation assumes some amount of
atomicity in CouchDB's map-reduce. Updating the code so it does not rely on
that assumption is left as an exercise to the reader.
Master/master replication or CouchDB's document sync have not been taken into
consideration. Master/master replication and sync make this problem
significantly more difficult.
In a real system, using time()
might result in collisions, so using
something with a bit more entropy might be a good idea; maybe "%s-%s"
%(time(), uuid())
, or using the document's _id
in the ordering.
Including the time is not strictly necessary, but it helps maintain a logical
if multiple requests come in at about the same time.