1

How can I efficiently check the existence of large number of couchbase docs using keys without impacting performance?

I would be checking the existence of 300,000 docs in an even much larger couchbase DB. I see the methods provided by couchbase java client but I am not sure how it would perform for a such a large number?

  1. ReactiveBatchHelper’s exists
  2. AsyncCollection's exists : It's the best in performance but is it enough?
  3. ReactiveCollection's exists

Use Case : I want to return the links of our website inside a sitemap to google for indexing. One website have many localised versions. So, assuming I have 100,000 websites for google spider to crawl and each website is available in 3 versions. Each translation of a website is stored in the couchbase doc. So, I have around 300,000 docs to check for existence.

Anmol Garg
  • 113
  • 1
  • 6
  • 1
    The best thing to do would be to benchmark and see whether this is a bottleneck for you. You haven't provided many details about the use case and setup that you have. Could you batch the exists() checks to avoid overwhelming the server/SDK with 300k checks? – paladin324 Nov 02 '22 at 21:09
  • 1
    Adding the use case in the post itself. – Anmol Garg Nov 03 '22 at 06:53
  • 2
    If you're happy with the _behavior_ of `ReactiveBatchHelper.exists()` (like, how it returns a flux of the keys of existing documents) it's hard to beat it in terms of performance. It's highly optimized for bulk existence checking. Just be aware it's not yet part of the SDK's "committed" public API (as of November 2022). – dnault Nov 03 '22 at 15:57

0 Answers0