Alright, here I am to answer my own question over a year later :). We did a lot of experimentation today when trying to migrate items out of a bucket containing roughly 2.6 million items into an SQL database. We wanted to make sure the row count matched between Couchbase and the new database before going live.
Unfortunately when we tried the normal select count(*) from <bucket>;
the document count we received was over what we expected by just 1, so we broke down the query and did a count
over all documents in the bucket while group
ing by an attribute, hoping to find what kind of document was missing in the target DB. The total for the counts for each group should have added up to the same total that we got from the count query. Unfortunately, they did not. The total added up to 1 fewer than we expected (so that's off by two from the original count query).
We found the category of document that was off by 1, expecting to have an extra doc in Couchbase that didn't make it to the target DB, but found instead that the totals indicated the reverse, that the target DB had one extra doc. This all seemed very fishy, so we did a query to pull all of the IDs in that group out into a single JSON file, and we counted them. Alas, the actual count of documents in that group matched up with the target DB, meaning that Couchbase's counting was incorrect in both cases.
I'm not sure what implementation details caused this to happen, but it seems like at least the over-counting might have been a caching issue. I was able to finally get a correct document count by using a query like this:
select count(*) from <bucket> where meta(<bucket>).id;
This query ran for much longer than the original count did, indicating that whatever cache is used for counts was being skipped, and it did come up with the correct number.
We were doing these tests on a relatively small number of documents, half a million or so. With the full volume of the bucket, counts had been off by as much as 15 in the past, apparently becoming less accurate as the document count increased.
We just did a re-sync of the full bucket. The bucket total as reported by the dashboard and by the original N1ql query are over the expected count by 7. We ran the modified query, waited for the result, and got the expected count.
In case you're wondering, we did turn off traffic to the bucket, so document counts were not likely to be fluctuating during this process, except when a document reached its expiry date in Couchbase, and was automatically deleted.