0

In our system we need to know if a reference to a specific document ID exists anywhere in the database. (for example to allow/deny deletion)

This previously worked fine with a view we created but now we have been asked to do the same thing using Cloudant query selectors.

We came up with the solution below where we include all possible paths where a document reference would be found. The query is generated dynamically to include all possible paths for a "foreign key" reference, however this mean it would need to scale to potentially +100 unique paths (ie, the $or operator array might end up with +100 items)

I wonder if such a large query would even work and also the potential performance implications. Also if there's an alternative way we would like to know.

{
   "selector": {
      "$or": [
         {
            "content": {
               "$elemMatch": {
                  "accounts": {
                     "$elemMatch": {
                        "bank": "bank12345"
                     }
                  }
               }
            }
         },
         {
            "content": {
               "$elemMatch": {
                  "partners": {
                     "$elemMatch": {
                        "someEntity": "someReference12345"
                     }
                  }
               }
            }
         }
      ]
   },
   "fields": [
      "_id",
      "_rev"
   ]
}
J. Araujo
  • 89
  • 1
  • 8

2 Answers2

1

So, let me get this right -- you previously used a view to do a simple lookup, which worked great, was efficient and easy to verify and you've been asked to instead make a CQ selector with hundreds of clauses?

Why? That's just crazy. This is the job for a view.

Leaving that aside for a while, I assume you're using JSON indexes for this. Your performance will almost certainly be atrociously bad, btw. You need to create indexes upfront for all the bits you want to be able to query on, or you may end up with a full DB scan instead of an index lookup.

Without seeing your documents, it's hard to offer more than generic advice. It sounds like your documents are large, and/or contain arrays that you in the worst case scenario update over time. I can't think of any other plausible way why you'd consider using a selector with more than a handful of clauses. Could you split out the partners array into separate documents instead?

xpqz
  • 3,617
  • 10
  • 16
  • Nice points, we are taking those things into consideration. Regarding breaking down objects we prefer to keep them together because we believe it's more in line with noSQL best practices for document modeling. – J. Araujo Jan 15 '18 at 15:50
  • Regarding the view performance we haven't measured it but because a view by default doesn't pre-process its results, it's not clear to me that a view would *necessarily* have a better performance than a selector in this case, provided the selector has proper indexes to work on. Can you elaborate on why that would be? – J. Araujo Jan 15 '18 at 15:54
  • Btw, the view approach also had its drawbacks since we had to maintain a map inside of the document of which fields where foreign keys to avoid a deep search into all properties. But even with this map, we still had to run the view once for every single document in the database, so unless you pre-cache your results I suppose that is not very performant either. – J. Araujo Jan 15 '18 at 16:08
  • You should model your documents such that data that _change_ together are in the same document. Your model violates that principle. I'm speculating as to your doc structure, but imagine if someone's adding an account at the same time as someone's adding a partner to the document. You'll get a conflict. Also, every time you add a partner, say, you have to first fetch the _whole_ document, parse the JSON, add the new partner, reencode the JSON and upload the _whole_ document. Mutable models are not ideal for a database like Cloudant. – xpqz Jan 15 '18 at 16:23
  • If you are using Cloudant (and not CouchDB), your views are always pre-computed. The map adds zero time at query time. – xpqz Jan 15 '18 at 16:24
  • Maybe I should've mentioned that 'account' and 'partner' are not in the same document thou, so simultaneous updates would not yield a conflict, I suppose. They both can refer to an external "bank" document but are otherwise unrelated. We've split documents where we thought would make sense but not too much, otherwise we'd be mimicking a relational DB on couchdb, which would obviously be sub-optimal. – J. Araujo Jan 15 '18 at 17:03
  • Some general advice that touches on some of your points: https://medium.com/ibm-watson-data-lab/cloudant-best-and-worst-practices-7ee2040da1b – xpqz Jan 18 '18 at 12:48
0

As an update to this discussion I would like to report that we did test a Cloudant query of many conditions and it only took 11 ms to execute, so although that's a lot we still need to learn about the inner working of selectors, it does seem you can do such queries as long as all queried fields are indexes without a significant impact on performance. But since it was a simple test don't quote me on this one. ;-)

J. Araujo
  • 89
  • 1
  • 8