Is keeping a log of all document relationships an anti-pattern in CouchDB?

Question

When we return each document in our database to be consumed by the client we also must to add a property "isInUse" to that document's response payload to indicate if a given documented is referenced by other documents .

This is needed because referenced documents cannot be deleted and so a trash bin button should not be displayed next to it's listing entry in the client-side app.

So basically we have relationships where a document can reference another link this:

{
  "_id": "factor:1I9JTM97D",
  "someProp": 1,
  "otherProp": 2,
  "defaultBank": <id of some bank document>
}

Previously we have used views and selectors to query for each documents references in other documents, however this proved to be non-trivial.

So here's how someone in our team has implemented this now: We register all relationships in dedicated "relationship" documents like the one below and update them every time a document created/updated/deleted by the server, to reflect anything new references or de-references:

{
  "_id": "docInUse:bank",
  "_rev": "7-f30ffb403549a00f63c6425376c99427",
  "items": [
    {
      "id": "bank:1S36U3FDD",
      "usedBy": [
        "factor:1I9JTM97D"
      ]
    },
    {
      "id": "bank:M6FXX6UA5",
      "usedBy": [
        "salesCharge:VDHV2M9I1",
        "salesCharge:7GA3BH32K"
      ]
    }
  ]
}

The question is whether this solution is an anti-pattern and what are the potential drawbacks.

I don't understand the question. I don't see how the title relates to the body. — Jonathan Hall, Feb 24 '18 at 10:04
Seconded. It's not clear what the question is. Try re-asking but with a clear question in the question body as well as the setup / explanation. — SCdF, Feb 26 '18 at 08:46
Sorry guys, I was out for the weekend but now I edited the post for clarity, let me know if there's room for improvement. — J. Araujo, Feb 26 '18 at 22:19
You say "Previously we have used views and selectors to query for each documents references in other documents, however this proved to be non-trivial." but don't explain why. My first impression is that hitting a view would be the right way to determine if the doc is referenced somewhere else, but your use case might have complications. Can you edit the question to better explain why that approach didn't work for you? — Eli Stevens, Feb 27 '18 at 00:13
Team decided to move away from views whenever we can use selectors (reasons were maintainability of code, etc) .With selectors we ended up using complex indexes to get this to work with good performance, but if a single doc structure changes and an index is not updated accordingly, that's all it takes for query performance to go south in a big way. Of course we've put solutions in place to auto-update our indexes based on when model changes etc, so that's part of why it's non-trivial. That's just our experience so far thou, not claiming at all that selectors+indexes aren't the way to go here. — J. Araujo, Feb 28 '18 at 13:33

score 1 · Accepted Answer · answered Feb 28 '18 at 08:01

I would say using a single document to record the relationships between all other documents could be problematic because

the document "docInUse:bank" could end up being updated frequently. Cloudant allows you to update documents but when you get to many thousands of revisions, then the document size becomes none trivial, because all the previous revision tokens are retained
updating a central document invites the problem of document conflicts if two processes attempt to update the document at the same time. You are allowed to have have conflicts, but it is your app's responsibility to manage them see here
if you have lots of relationships, this document could get very large (I don't know enough about your app to judge)

Another solution is to keep your bank:*, factor:* & salesCharge:* documents the same and create a document per relationship e.g.

{
  "_id": "1251251921251251",
  "type": "relationship",
  "doc": "bank:1S36U3FDD",
  "usedby": "factor:1I9JTM97D"
}

You can then find out documents on either side of the "join" by querying documents by the value of doc or usedby with a suitable index.

I've also seen implementations, where the document's _id field contains all of the information:

{
  "_id": "bank:1S36U3FDD:factor:1I9JTM97D"
  "added": "2018-02-28 10:24:22"
}

and the primary key helpfully sorts the document ids for you allowing you to use judicious use of GET /db/_all_docs?startkey=x&endkey=y to fetch the relationships for the given bank id.

If you need to undo a relationship, just delete the document!

Wow, just noticed it's seems to be using a relationship document per documentType. That's definitely a big no-no for the reasons you mentioned. I agree that immutable, single-relationship documents would be a better solution. A potential drawback that I see is that if each document has 10 relationships than we are talking at least 9 of 10 documents in our database being relationship documents. (Not sure if that would be a problem in itself, thou) — J. Araujo, Feb 28 '18 at 13:15

score 0 · Answer 2 · answered Mar 03 '18 at 15:44

0

By building a cache of relationships on every document create/update/delete as you currently implemented it, you are basically recreating an index manually in the database. This is the reason why I would lean towards calling it an antipattern.

One great way to improve your design is to store each relation as a separate document as Glynn suggested.

If your concern is consistency (which I think might be the case, judging by looking at the document types you mentioned), try to put all information about a transaction into a single document. You can define the relationships in a consistent place in your documents, so updating the views would not be necessary:

{
  "_id":"salesCharge:VDHV2M9I1",
  "relations": [
    { "type": "bank", "id": "bank:M6FXX6UA5" },
    { "type": "whatever", "id": "whatever:xy" }
  ]
}

Then you can keep your views consistent, and you can rely on CouchDB to keep the "relation cache" up to date.

answered Mar 03 '18 at 15:44

Bernhard Gschwantner

1,547
11
12

The reason we are not storing relationships in the documents themselves is because since they since they're bidirectional, if we used that approach we'd have to store information in both documents and keep them in sync. I'm not sure what "all the information about a transaction means" in your answer. – J. Araujo Mar 05 '18 at 14:24
You don't need to store the relations in both documents – you can store it in one and just return both directions in a view. e.g. in my example above `emit(_id, relation.id); emit(relation.id, _id)` would be enough (I simplified a bit). – Bernhard Gschwantner Mar 22 '18 at 18:14
"all the information about ay transaction": for example for a money transfer between two accounts store the amount and information about both recipient and sender account in a transaction document, just like book keeping. Calculate the balance by summarizing amounts per account, just like a bank would do that too. I'm just guessing what your use case is, and I'm greatly simplifying. Depending on your use case a different solution might be even better. – Bernhard Gschwantner Mar 22 '18 at 18:19

Is keeping a log of all document relationships an anti-pattern in CouchDB?

2 Answers2