16

In general, if I want to be sure what happens when several threads make concurrent updates to the same item in DynamoDB, I should use conditional updates (i.e.,"optimistic locking"). I know that. But I was wondering if there is any other case when I can be sure that concurrent updates to the same item survive.

For example, in Cassandra, making concurrent updates to different attributes of the same item is fine, and both updates will eventually be available to read. Is the same true in DynamoDB? Or is it possible that only one of these updates survive?

A very similar question is what happens if I add, concurrently, two different values to a set or list in the same item. Am I guaranteed that I'll eventually see both values when I read this set or list, or is it possible that one of the additions will mask out the other during some sort of DynamoDB "conflict resolution" protocol?

I see a version of my second question was already asked here in the past Are DynamoDB "set" values CDRTs?, but the answer refered to a not-very-clear FAQ entry which doesn't exist any more. What's I would most like to see as an answer to my question is an official DynamoDB documentation that says how DynamoDB handles concurrent updates when neither "conditional updates" nor "transactions" are involved, and in particular what happens in the above two examples. Absent such official documentation, does anyone have any real-world experience with such concurrent updates?

Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
  • 1
    While waiting for an answer, you might consider firing up several instances of a script that continually mutate the same item with new properties recording what *should* be set. When the dust settles, pull the document and ensure all the properties are set correctly. – Xeoncross Apr 09 '19 at 15:34
  • 3
    Thanks, this is a good idea, and I'll probably end up doing this. If I'm lucky enough (?) to notice missing updates, I'll know the answer to my question is "no". But if I don't see a problem, I'll still not be convinced the answer is "yes", because it will always be possible that DynamDB's intra-region network was just quick enough to do all my updates sequentially. – Nadav Har'El Apr 10 '19 at 07:33

2 Answers2

13

I just had the same question and came across this thread. Given that there was no answer I decided to test it myself.

The answer, as far as I can observe is that as long as you are updating different attributes it will eventually succeed. It does take a little bit longer the more updates I push to the item so they appear to be written in sequence rather than in parallel.

I also tried updating a single List attribute in parallel and this expectedly fail, the resulting list once all queries had completed was broken and only had some of the entries pushed to it.

The test I ran was pretty rudimentary and I might be missing something but I believe the conclusion to be correct.

For completeness, here is the script I used, nodejs.

const aws = require('aws-sdk');
const ddb = new aws.DynamoDB.DocumentClient();

const key = process.argv[2];
const num = process.argv[3];


run().then(() => {
    console.log('Done');
});

async function run() {
    const p = [];
    for (let i = 0; i < num; i++) {
        p.push(ddb.update({
            TableName: 'concurrency-test',
            Key: {x: key},
            UpdateExpression: 'SET #k = :v',
            ExpressionAttributeValues: {
                ':v': `test-${i}`
            },
            ExpressionAttributeNames: {
                '#k': `k${i}`
            }
        }).promise());
    }

    await Promise.all(p);

    const response = await ddb.get({TableName: 'concurrency-test', Key: {x: key}}).promise();
    const item = response.Item;

    console.log('keys', Object.keys(item).length);
}

Run like so:

node index.js {key} {number}
node index.js myKey 10

Timings:

  • 10 updates: ~1.5s
  • 100 updates: ~2s
  • 1000 updates: ~10-20s (fluctuated a lot)

Worth noting is that the metrics show a lot of throttled events but these are handled internally by the nodejs sdk using exponential backoff so once the dust settled everything was written as expected.

Carl
  • 680
  • 6
  • 22
  • 10
    After asking my question, I found an official talk by the DynamoDB developers (https://www.youtube.com/watch?v=yvBR71D0nAQ) explaining how DynamoDB works internally. It explains that DynamoDB uses a "leader model", meaning that a single node does all the updates to the same item. So it can trivially serialize updates to the same item, so it's not surprising that concurrent updates to different attributes survive. But I don't understand why, in such implementations, concurrent list appends don't work. How did you do them? – Nadav Har'El Sep 16 '19 at 07:27
3

Your post contains quite a lot of questions.

There's a note in DynamoDB's manual:

All write requests are applied in the order in which they were received.

I assume that the clients send the requests in the order they were passed through a call.

That should resolve the question whether there are any guarantees. If you update different properties of an item in several requests updating only those properties, it should end up in an expected state (the 'sum' of the distinct changes).

If you, on the other hand, update the whole object, the last one will win.

DynamoDB has @DynamoDbVersion which you can use for optimistic locking to manage concurent writes of whole objects.

For scenarios like auctions, parallel tick counts (such as "likes"), DynamoDB offers AtomicCounters.

If you update a list, that depends on if you use the DynamoDB's list type (L), or if it is just a property and the client translates the lists into a String (S). So if you read a property, change it, and write, and do that in parallel, the result will be subject to eventual consistency - what you will read may not be the latest write. Applied to lists, and several times, you'll end up with some of the elements added, and some not (or, better said, added but then overwritten).

malthe
  • 1,237
  • 13
  • 25
Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277