3

I am currently thinking about using CouchDB 2 and PouchDB 7 in my next app I want to write. Basically I will have a CouchDB in a central storage and web clients and mobile apps will start a PouchDB that thinks. Basically this works like a charm.

But... How do I do filtered sync between CouchDB and PouchDB if the filter should be done based on document ownership?

I know about the solutions with per user database. But my documents will have shared access by the creator of a documents and people he/she adds as reader or writer.

Any solutions in 2018 for this problem? Back in 2016 I was unable to solve this issue and dropped the app idea.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • "2018 edition" isn't very specific. Maybe you can mention the _exact_ version(s) of the relevant software you're using? – Jonathan Hall Jun 25 '18 at 17:00
  • @Flimzy PouchDB 7 and CouchDB 2. I am starting a new project, so I would go the most recent versions. –  Jun 25 '18 at 19:05
  • I would expect you to use the most recent versions, but there will be multiple "most recent versions" throughout 2018. – Jonathan Hall Jun 26 '18 at 07:10

1 Answers1

6

You should include in your documents the information that you require to restrict the access to the document, ownership, authorized users.

Based on this information, there are two options for filtered replication definition between CouchDB and PouchDB (check filtering options).

  1. Based on JavaScript filter functions defined in CouchDB design documents. Filter functions allow you to implement your filtering logic that accept parameters provided during the request as URL parameters or the user that is authenticated in CouchDB via the req parameter.

    The main problem with this approach is that you will notice a performance degradation as long as your database grows. The filter is applied to every doc, even deleted ones, in the database in order to produce a result. So I do not recommend this filtering mechanism if you foresee that you will have a significant number of docs in the database. Here you have a sample of this kind of problems.

    A lite improvement over this performance problem is to write your filtering logic in Erlang, which is a bit more complex than the JS option, and during my tests I didn't manage to have a big gain with this.

  2. In CouchDB 2.x there is the option of perform filtered replication using selectors. Selectors can be indexed and are reported to be 10x faster than JS filters. Selectors are completely defined by the client and are not based on the authentication context in the database. This option scales much better than the previous one.

In any case, filtering allows you to do some database segmentation during the replication process but it is not a security mechanism for document-level read permissions.

Document write permissions can be achieved using validate document update functions.


UPDATE I revisited this answer trying to offer more precise information about database filtering mechanisms. I've tested the performance of the different filtering approaches trying to confirm the answer statements.

I loaded a database with 9000 docs and I performed time measurements of the _changes feed filtering using four techniques: JS filtering, Erlang filtering, Mango selectors filtering and Doc id filtering with the following results:

  • JS filtering of 9000 docs - 4.3 secs
  • Erlang filtering of 9000 docs - 2.3 secs
  • Mango selector filtering of 9000 docs - 0.48 secs
  • Doc ids filtering of 9000 docs - 0.01 secs

The test confirms that the JS filtering is the worse option as it needs to evaluate filter condition in an external process which introduces and additional overhead. Erlang and Mango expressions are evaluated inside the filtering process which represents a real performance gain.

In order to verify the impact of the number of docs over filtering, I created a database with 20.000 docs and I performed the same tests with the following results:

  • JS filtering of 20.000 docs - 10 secs
  • Erlang filtering of 20.000 docs - 5.45 secs
  • Mango selector filtering of 20.000 docs - 1.07 secs
  • Doc ids filtering of 20.000 docs - 0.01 secs

JS, Erlang and Mango filtering the time increment is linear to the # of docs. No index are used for these filtering mechanisms. Doc ids filtering is constant as it is based on the _id index.

Juanjo Rodriguez
  • 2,103
  • 8
  • 19
  • How do you achieve doc id filtering for filtered replication in pouchdb? Can you explain? – sureshvv Jan 08 '20 at 11:24
  • Which is the approach implemented here https://pouchdb.com/2015/04/05/filtered-replication.html ? Seens like a memory hog and my couchdb is dying. – sureshvv Jan 08 '20 at 11:33
  • How do you do Mango selector filtering? Can you provide a link or example please? – sureshvv Jan 08 '20 at 11:39
  • 2
    The post you refer is using JS Filtering. Replications options are documented here https://pouchdb.com/api.html#replication, use options.doc_ids (for doc_id filtering) and options.selector (for mango selector filtering) – Juanjo Rodriguez Jan 08 '20 at 19:36