What is the behavior of local_seq under CouchDB 2.x?

Question

In CouchDB 1.x, documents had a "hidden" ._local_seq field that tracked the database's update sequence at the state when the document revision was written. This could be used by views by including the {local_seq:true}option in the design document, or fetched by clients using the ?local_seq=true query option on a document GET request.

This field is still available in CouchDB 2.x, but it is unclear how it behaves. Because of the clustering, the database update sequence is now "an opaque token" whereas the local_seq is still a plain integer that doesn't seem to always match up in practice.

Is there any relationship, particularly if I limit myself to a single-node cluster?

natevw · Answer 1 · 2019-01-23T22:17:03.033

Here's what I've figured out so far:

For starters, the purpose of local_seq is indeed much less clear under CouchDB 2.x. As already mentioned in the question, at the database/_changes level, the sequence numbers have been replaced by an "opaque token" that officially has no relationship whatsoever to the local_seq.

To make matters worse, it appears that the local_seq value stored with [or at least derivable from] each document is not local to the database but to the shard! (Each database is split into multiple parts internally; you can read more about Shards and Replicas in the docs for details.)

So whereas in CouchDB 1.x one could, for example, make a custom changes feed by emitting the local_seq as part of a map-reduce index's key — and it would match up with the ?since=N values of the database's _changes feed — in the cluster-focused CouchDB 2.x such a view would tend to emit multiple documents [up to one from each shard] that have the same ._local_seq field as each other!

And from the other angle, under CouchDB 2.x, even the "seq" associated with a particular document in the database-level _changes feed can change from request to request — both the prefix and/or the big long mess after it. I'm not sure if there's a way, or any useful advantage, that the view engine could provide a "non-local sequence" value to the map function like it does with the local sequence.

That said, I found a "coincidental" way as of CouchDB 2.3.0 to retain some of the usefulness:

By creating a database with PUT /newdb?n=1&q=1 — i.e. configuring 1 replica and only 1 shard — the local_seq values end up being unique within a database.
In those circumstances, the first part of each seq token in database-level _changes feed also seems to match up with the local_seq of each changed document. I.e. if you split the string token on '-' and convert the first part to a number, you seem to get the local_seq.

I would rely on this with caution, as:

in the event you do need to scale up, and choose to do so by using the multi-node clustering features, any code that relies on the above will break
it is in no way officially sanctioned, and could theoretically break in so much as a point release. The CouchDB developers have been very clear that the _changes-level tokens are opaque and that you should treat them as such.

So caveat hackor and all that, the "coincidences" above do match a description of how these tokens work at the moment:

The number on the front is the sum of the individual update sequences encoded in the second part and exists only to trick older versions of the couchdb replicator into making checkpoints.

✅ If there's only one update sequence, the sum should be the just be the original sequence.

For a given shard [the _changes feed] is totally ordered (a shard being identical to a pre 2.0 database with an integer sequence), couchdb doesn't shuffle that output […]

✅ With only one shard [n.b. shard, not simply one node/replica], you are pretty much left with the pre-2.0 database sort of behavior.

What is the behavior of local_seq under CouchDB 2.x?

1 Answers1