1

If I store an array-like structure in couchbase like this one:

mykey = 3
key_1 = 47
key_2 = 11
key_3 = 17

and my update procedure is something like this:

a = increment(mykey)
set key_a = 42

will this work on a bucket replicated across multiple datacenters? Is there a better way of doing this?

I'm thinking that two clients on different data centers might call increment at the same time, getting the same value, and then setting the same key to different values.

Matthew Groves
  • 25,181
  • 9
  • 71
  • 121
Filip Haglund
  • 13,919
  • 13
  • 64
  • 113
  • possible duplicate of [Using an Increment counter for unique key generation in a Couchbase cluster](http://stackoverflow.com/questions/18675527/using-an-increment-counter-for-unique-key-generation-in-a-couchbase-cluster) – Filip Haglund Apr 13 '14 at 20:15

1 Answers1

2

If you're using XDCR, situations with duplicate keys are possible. And even if you check value before set (using couchbase.add operation) it can also produce two identical keys.

Within a cluster, Couchbase Server provides strong consistency at the document level. On the other hand, XDCR also provides eventual consistency across clusters. Built-in conflict resolution will pick the same “winner” on both the clusters if the same document was mutated on both the clusters. If a conflict occurs, the document with the most updates will be considered the “winner.” If the same document is updated the same number of times on the source and destination, additional metadata such as numerical sequence, CAS value, document flags and expiration TTL value are used to pick the “winner.” XDCR applies the same rule across clusters to make sure document consistency is maintained.

To avoid this couchbase recommends to store some info about datacenter/cluster or use unique keys like GUIDs. I think that the second way is not preferred, so you can implement the first one by adding datacenter location as key prefix and handle them on application side:

US-east.mykey_1
US-west.mykey_1
m03geek
  • 2,508
  • 1
  • 21
  • 41
  • so using a datacenter-specific prefix is the preferred way of doing this? – Filip Haglund Apr 14 '14 at 19:34
  • 1
    Yes. I've read this somewhere in manuals, but I couldn't remember where to give you a link. Generally you can provide unique keys (with XDCR) in two ways: use GUIDs or use datacenter prefixes. Also there is another way: you can provide one centralized source of sequential ids, but this will kill the idea of high availability. – m03geek Apr 14 '14 at 20:57
  • I thought of that too, having just one cluster do all the increment(). However, I talked to a guy responsible for spotify's cassandra setup, and their datacenters disconnect from each other "a couple of times a day". The datacenter prefix solution is probably the best, since I don't have any sensible GUID for this datastructure. When GUID's make sense, they should definatelly be used! – Filip Haglund Apr 15 '14 at 06:03