27

Are there any techniques/proposals to enforce unique constraints? Yes, we can create key that's unique, but we cannot change key and keys and also this approach is not suitable for complicated validation (separate unique login, separate unique email, etc...)

For example, an Account should have unique login and email. Deriving a key from this fields will result in inconsistency:

key1: "Account-john@example.net-john", { email: "john@example.net", login: "john"}
key2: "Account-mary@example.net-mary", { email: "mary@example.net", login: "mary"}

Looking good, but:

key1: "Account-john@example.net-mary", { email: "john@example.net", login: "mary"}
key2: "Account-mary@example.net-mary", { email: "mary@example.net", login: "mary"}

Oops, now we have 2 accounts with login: "mary"

Barry Wark
  • 107,306
  • 24
  • 181
  • 206
Sam
  • 428
  • 1
  • 4
  • 11

4 Answers4

26

Core Answer

Structure your POSTs/PUTs for the document which has the field you want to keep unique as follows:

  1. Create a view. In the map function use the field you want to enforce unique as the key. Value can be nothing. Use the reduce function to get a count for each of your keys. The best way (for performance) is to use the built in _count reduce function.

  2. Immediately after you PUT/POST a new document into the database, grab the returned id and rev and GET /yourdb/_design/yourapp/_view/viewname?group=true&key="value-of-your-unique-field-from-step-1".

  3. If the result of the last GET gives you a count value other than 1, then you just inserted a duplicate. Immediately DELETE /yourdb/id-from-step-2?rev=rev-from-step-2.

  4. Relax.


Rough Example

Lets say you are storing user accounts and you want to make sure the email address is unique, but you don't want to make it the document id (for whatever reason). Build a view to quickly check uniqueness on email address as described above. Lets call it emails. It would have a map function possibly similar to this...

function(doc) {  
  if(doc.type === 'account') {
    emit(doc.email, 1);
  }
}

And just _count as the reduce function. If you emit a 1 like above _sum will also work. Having and checking a type field on your doc as shown above is just a convention but I find it helpful sometimes. If all you are storing is user accounts, you probably don't need that.

Now lets say we are inserting a document like so ...

POST /mydb/
{
  "name": "Joe",
  "email": "joe@example.org"
}

And CouchDB will respond with something like ...

{
  ok: true,
  id: 7c5c249481f311e3ad9ae13f952f2d55,
  rev: 1-442a0ec9af691a6b1690379996ccaba6
}

Check to see if we now have more than one joe@example.org in the database ...

GET /mydb/_design/myapp/_view/emails/?group=true&key="joe@example.org"

And CouchDB will respond with something like...

{
  rows: [
    {
      key: "joe@example.org",
      value: 1
    }
  ]
}

If value is anything other than 1 in that reply, you probably just inserted a duplicate email. So delete the document and return an error along the lines of Email Address must be Unique, similar to a typical SQL database response, or whatever you want.

The delete would go something like this ...

DELETE /mydb/7c5c249481f311e3ad9ae13f952f2d55?rev=1-442a0ec9af691a6b1690379996ccaba6

Short Discussion (if you care)

If you are coming from a good old *SQL background, this is going to seem wrong and weird. Before you flip out consider these points. Having a constraint on a field, such as uniqueness or anything else, implies a schema. CouchDB is schemaless. Coming from *SQL means you will have to change the way you think. ACID and Locks are not the only Way. CouchDB comes with a lot of flexibility and power. How you use that to get what you want is going to depend on the details of your use case and how well you can escape the confines of traditional relational database thinking.

The reason why I sometimes implement uniqueness this way is because it feels very Couchy to me. If you think about the way CouchDB works with handling update conflicts, etc, this approach follows the same flow. We store the doc we are given, we then check to see if our field is unique. If not, grab that doc we just inserted by the convenient handle we still have on it, and do something to resolve the need for uniqueness, like deleting it for example.

You may be tempted in the example above to check the uniqueness of the email address before you POST the doc. Be careful!! If there are multiple things going on, its possible that another doc with the same email address can be inserted into the database in the instant after you check if the email exists, but before you do your POST! That will leave you with a duplicate email.

There is nothing wrong with also running the check for the email address before you POST. For example, this is a good idea if your user is filling out a form and you can ajax the email field value out to the database. You can pre-warn the user that the email address exists or prevent submission, etc. However, in all cases, you should also always check for the uniqueness after you POST the doc. Then react as needed. From the perspective of the UI side, the steps above would look no different than the result obtained from a traditional *SQL database puking on a uniqueness constraint.

Can anything go wrong? Yes. Consider this. Assume joe@example.org does not already exist in the database. Two docs come in almost at the same time to be saved to the database. Doc A is POSTed, then Doc B is POSTed but before we are able to check the uniqueness for the Doc A POST. So now when we do the uniqueness check for the Doc A POST, we see that joe@example.org is in the database twice. So, we will delete it and report back with the problem. But lets say before we can delete Doc A, the uniqueness check for Doc B also happens, getting the same count value for joe@example.org. Now both POSTs will be rejected, even though joe@example.org was originally actually not in the database! In other words, if two docs with a matching value come into your app at almost the same time, its possible that they could see each others POSTs and mistakenly conclude that the value they carry is already in the database! We can't really prevent this because there is no traditional RDB style locking in CouchDB. But in exchange for that small price, we get master-master replication and a ton of other cool stuff. I'll take it! And if this is a huge problem for your implementation, you can try to address it by implementing a retry mechanism of some sort, etc.

Finally, CouchDB really is a gem of a database, but don't just take this and expect it to be a bulletproof approach. A lot is really going to depend on the details of your specific application.

Raffi M.
  • 361
  • 4
  • 8
  • 9
    In reference to this: _"but you don't want to make it the document id (for whatever reason)"_ I'd just like to point out that using the email and/or username as the _id is undesirable. If a user changes their email or username it would break any external references to that doc id (and there will probably be a bunch). – s.co.tt Apr 23 '14 at 16:42
  • 2
    True. That falls under "for whatever reason" I guess. :) – Raffi M. Apr 24 '14 at 18:45
  • 3
    Well, It's a good reason :) BTW - I did appreciate your Short Discussion a lot. Very good info. – s.co.tt Apr 24 '14 at 20:45
  • You left out one very important detail: you should create a **placeholder** document during registration, which you then update after registration to a valid document. Scenario: user A is registering his new account with the same username as user B, user B logs in at the time when user A submitted his registration. User B's login could fail due to his username fetching placeholder document for user A's registration. While this example is harmless, something could go terribly wrong somewhere else in your application if you do not account for this in advance. – user1973386 Jul 11 '14 at 10:48
  • 1
    Can't edit my previous comment. The point is, you should be able to check if the document that you fetched is a placeholder, then you can recover properly. For instance, re-fetching the document if you get a placeholder instead of the real document during a login. – user1973386 Jul 11 '14 at 11:01
  • 1
    The use of the word "immediately" is supposed to guarantee the absence of race conditions :) – Artem Oboturov Dec 03 '16 at 13:44
  • Your discussion is good. I'd like to emphasize the tradeoff more. Sometimes people tend to pick nosql solutions because they 'solve' more stuff or make things 'easier'. While, in reality, it solves *different* stuff. They (nosql) traded data normalisation for stuff like easier multi-master replication and more. This whole thing about uniqueness is one of the pains gained, but it *is* part of the design philosophy of couchdb/nosql. Your solution to the problem is fine, if the race condition you mention is a real problem, you probably should wonder if nosql is the right solution to your problem. – Alex Nov 21 '19 at 10:59
  • Will this solution still be at risk of creating duplicates in a master-master replication setup? In the case of simultaneous POST calls to insert the same currently unique value but both calls targeting a different master, wouldn't the replication delay between the masters introduce a risk of duplicates here that will be happily synced to both masters afterwards? – KrekkieD Jan 02 '22 at 08:32
18

This is one of the less fun bits of CouchDB. The best way I've found for handling unique fields that can change (like in your user example) is to create "pointer" documents with the unique values as a component of the key, and then use those to let you claim unique values. The important part of doing this is having a predictable key for the primary document, then saving the unique field claim documents prior to saving the primary (and letting key conflicts prevent the primary document from saving).

Given a user with a unique username and unique email, your primary documents might look like this:

user-1234: { username: "kurt", email: "kurt@localhost" }
user-9876: { username: "petunia", email: "pie@crust.com" }

The unique field pointers would look something like this:

user-username-kurt: { primary_doc: "user-1234" }
user-email-kurt@localhost: { primary_doc: "user-1234" }
user-username-petunia: { primary_doc: "user-9876" }
user-email-pie@crust.com: { primary_doc: "user-9876" }

Creating or updating a user would take these steps:

  1. Prep your user document, generate a key for it if necessary
  2. Save a "pointer" document for each changed unique field
  3. If saving any of those fails, stop and fix errors
  4. Save primary user document

Step 3 will take some thought. For instance, you won't want to try claim unique values for fields that haven't changed. You could, but then you'd have to put some additional logic in to handle a case where you're claiming a value for a user that already owns that value.

Step 3 would be a good place to let people take old claimed values as well. If one user has "released" the username kurt, for instance, I can just update that particular document to point to my new key after verifying that it's no longer in use. The alternative is to clear out claimed unique values when they change. I'm not sure which would be less work, really. Leaving stale claimed values in makes the most sense to me.

The interesting thing about this solution is that you don't need to use those pointer documents for anything once they're created. You can create views as normal on your user docs, and use them to query by email or username.

This setup also allows you to do relationships without having to worry about cascading user keys. I don't know about you, but my user documents are referenced by just about every other document in the system. Changing a user key would be a massive pain in the ass.

MrKurt
  • 5,080
  • 1
  • 23
  • 22
  • 5
    Yeah, this is a solution, but personally I prefer locking with Redis. Such solutions looking weird when developers are coming from RDBMS world. CouchDB is different then RDBMS, but should we reject all existing practices? There is no ACID and locks, but without all this stuff it's impossible to relax and just write applications. CouchDB missing key features and this is sad. – Sam Dec 19 '09 at 22:37
  • Hashing the unique value before adding it to the document ID makes it more convenient to add values that may have special characters. – Petri Lehtinen Sep 30 '11 at 06:18
  • 3
    Do note that if you plan to keep things multi-master friendly, point 2. will **not** fail unless the key already exists on the *current* node. Keys are only unique on the node that they exist on. Check out Raffi M.'s answer for a simple cluster friendly solution to this problem. – user1973386 Jul 11 '14 at 11:30
  • This feels like the way you should **not** use CouchDB. Of course I can be wrong, but this feels like a hack. – Alex Nov 21 '19 at 10:28
10

It depends. Consider the multi-master replicated case, there could be conflicting entries added there consistent within each master, but not consistent once they replicate. You might only be using one couchdb server, but in general they design it assuming a multimaster case, and don't put in any features that would only work correctly in single unreplicated server.

If you only care about the single server case the conceivably you could rebuild your couchjs with networking support and perform an http query in your validate_doc_update() function that would perform a query against the DB to see if the email address already is already used and fail the update if so. Check here for more details about the validation mechanism. I don't recommend doing that, instead I would embed all the uniqueness in the id field (either directly or via hashing) and just deal with moving the doc if the user changed anything that effected that.

August Lilleaas
  • 54,010
  • 13
  • 102
  • 111
Louis Gerbarg
  • 43,356
  • 8
  • 80
  • 90
  • this is a wicked smart way to do things – Tegra Detra Jun 20 '13 at 23:03
  • 1
    This is a good approach unless the unique field value (email address for example) can change. Then you would need to rehash for a new id. Since you can not change the id of an existing record, you would have to create a new record and delete the old one, which then makes this a circular problem. – Raffi M. Dec 04 '16 at 18:43
0

CouchDB has "changes feed" http://docs.couchdb.org/en/2.3.1/api/database/changes.html#changes

An update action is the same as a new action, only differing by _rev and object's 'new' property.

Filter functions work like a list or a show function but differ in being oriented to 'changes queue actions' http://docs.couchdb.org/en/2.3.1/ddocs/ddocs.html#filterfun

Design documents, contemplate the ability to filter any adding to the changes queue.

https://docs.couchdb.org/en/master/ddocs/index.html