CouchDB filtered replication - Can it use index while replicating?

Question

Have an offline application, which syncs the data between PouchDB & CouchDB. There are around 200k documents in a bucket. What I notice is while replicating, it runs though all the documents & syncs those that match the criteria.

All the documents have an attribute called 'channel':String. As a user of the system, I will have access to a bunch of channels e.g. ('c1','c2', ...) During filtered replication, I expect it to run through only those documents matching channel = 'c1' and not all.

Please, check this answer that point yo the use of Mango Selector in order to improve filtering performance: https://stackoverflow.com/questions/50994899/filtered-sync-between-couchdb-and-pouchdb/50995858#50995858 — Juanjo Rodriguez, Jan 28 '19 at 16:00
Thanks Juanjo, really appreciate your time. I am aware of passing selector to the changes feed. However, I wouldn't be able to pass request parameters with this approach. Just to elaborate more on the problem, the client would send the channels it has access to as stated above (c1, c2,...) & it needs to fetch only those documents that belong to these channels. — Maqbool Ahmed, Jan 29 '19 at 04:45
I have an alternate solution - Hit the REST API (my own) & get the ids belonging to given channels (since the API can make use of secondary indexes) & then replicate with the result ids. The downside of it is: 1. There will be 2 calls - API & replication 2. Not sure how efficient it will be for huge data, of course once I get the ids I can start replicating in batches. Please suggest what do you guys think. @nlawson your inputs will be really valuable, thanks. — Maqbool Ahmed, Jan 29 '19 at 04:54
Just found out even selectors cannot use the indexes. https://github.com/pouchdb/pouchdb/issues/7615 — Maqbool Ahmed, Jan 29 '19 at 05:34
The client can build the selector to use, you dont need to access to request parameters, just include the values in the selector expression. The issue you refered is not clear, it is missing the selector and the index definition. I don't see a real issue there. — Juanjo Rodriguez, Jan 29 '19 at 10:40
Ah! I see what you mean. I will try this approach & see if it is any better. The whole concern is improvising the performance on bigger datasets. I shall share the numbers. Thanks. — Maqbool Ahmed, Jan 31 '19 at 05:34
@Juanjo: Have tested replication by both filter & selector with 125k documents out of which 63k are synced to PouchDB. The time difference is 20 seconds, first one takes 232 seconds whereas the later takes 211. So the pending issue of not using the index while replicating via selector is still present. Hopefully they will fix it soon. Thanks for the guidance. Cheers! — Maqbool Ahmed, Feb 03 '19 at 12:08
You can check whether your selector is using an index using this endpoint http://docs.couchdb.org/en/stable/api/database/find.html#db-explain — Juanjo Rodriguez, Feb 04 '19 at 08:52
@Juanjo: Thanks, this is very useful. The explain API says it using the proper index. But I don't see drastic performance improvements during the replication though. — Maqbool Ahmed, Feb 04 '19 at 09:21
I see the problem. _find endpoint uses indexes but filtered _changes is not using indexes. The real gain of using selectors with _chages is to avoid the use of a external evaluation process (couchjs) which introduces an additional overhead to the filtering. — Juanjo Rodriguez, Feb 04 '19 at 18:22

CouchDB filtered replication - Can it use index while replicating?

0 Answers0