CouchDB replication strategy with dynamic groups of users

Question

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y

Possible groups: A+C, A+B

The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.

The goal:
To replicate changes and deletions accordingly.

What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present

Is there a recipe for this situation? Am I missing an obvious solution?

EDIT

Partial solution for case 1:

    localDB.sync(remoteDB, {
        live: true,
        retry: true,
        filter: 'app/by_user',
        query_params: { "agente": agent }
    })
    .on('paused', function(info){
        console.log("paused");
        localDB.allDocs().then(function(docs){
            console.log("allDocs");
            docs.rows.forEach(function(row){
                console.log(row);
                remoteDB.get(row.id)
                       .then(function(doc){
                    if(doc.Agents.indexOf(agent) < 0){
                        localDB.remove(doc);
                    }
                });

            });
        });
    })
    .on('change', function(result){
            console.log("change!");
            result.change.docs.forEach(function(change) {
                if(!change.deleted){
                    $rootScope.$apply(function(){
                        $rootScope.$broadcast('upsert', change);
                    });
                }
            });
    });

Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"

You might also want to try asking this question in the #couchdb IRC room. They have much more experience with elaborate permissions systems like this. — nlawson, Mar 07 '15 at 20:05
That's a good advice. We're also seeing a consultant on offline-first apps next week. Hopefully he'll have some insights. — , Mar 07 '15 at 22:13

score 2 · Answer 1 · answered Mar 07 '15 at 20:02

2

(3) Seems like the simplest solution to me, i.e. the "database per role" solution.

I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.

Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.

Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.

Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!

answered Mar 07 '15 at 20:02

nlawson

11,510
4
40
50

Your last paragraph is very interesting, I didn't know that, very useful indeed. Won't this behavior trigger with filtered replication as well? (documents no longer in the filter show up in the denied event). The big issue I think is that "roles" (so DBs) change their container (their documents) at the beginning of each day. Basically DBs and roles would be anew everyday and we cannot have users download hundreds of MBs each day. To put it in the domain terms: we have salesmen with different routes, some nodes are the same for more than one salesman. – Mar 07 '15 at 22:19
No, filtered replication would not tell you when a document is no longer in a filter. TBH it sounds like Couch/Pouch might be a poor fit for your domain. It seems like what you really need is just AJAX with some light caching, especially if the salesmen's routes are largely read-only. Just download their routes once a day, and blow away yesterday's download, right? – nlawson Mar 07 '15 at 23:22
I wish... salesmen have a data cap on their connection and wifi is sketchy. The client is very sensible about download size. Regardless, the app needs to work offline. Maybe I'm just trying to put too much information in the same document. Maybe have two DB (routes, nodes) and make a pseudo-join? I know it's a sin in the noSQL world but hey... – Mar 09 '15 at 08:03
I've updated my question. TL;DR: Is there a way to tell Pouch "no longer consider this document as replicable and just remove it from my DB?" – Mar 09 '15 at 11:16
That would be the purge() function, which hasn't been implemented yet. https://github.com/pouchdb/pouchdb/issues/802 In the meantime, you could replicate from one local DB to another local DB using filtered replication, and then destroy() the old database. Or just ignore the document. – nlawson Mar 09 '15 at 12:22
I've also updated my answer. The idea of double replication sounds overly complex for our use-case. I'll wait for purge :) Thanks a lot Nolan! – Mar 09 '15 at 12:42

score 0 · Answer 2 · 2015-03-09T11:59:43.750

We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh

Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.

score 0 · Answer 3 · answered Apr 02 '15 at 17:05

1) still sounds as the simplest approach to me..

I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.

I mean.. a delete is like an update which sets the _deleted attribute to true.

So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:

function(doc,req){
   // optional acls for deleting doc.. doc is owned by req.userCtx.name

   // doc.users are users already granted to work with this doc

   return [{
       "_id" : doc._id,
       "_rev": doc._rev,
       "_deleted":true,
       "users": doc.users
   },"Ok doc deleted"];

}

Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.

The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

Thanks to @nlawson we implemented solution 1. The problem of "document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed" still stands however. I've talked about it at length with my boss and we decided that some entropy client-side is acceptable. We changed the type of documents (the level of the hierarchy that constitutes a document) to lessen the impact. This means "wrong" documents will only be seen by the client for 30-days tops before getting deleted in the worst-case scenario. — , Apr 05 '15 at 10:01

CouchDB replication strategy with dynamic groups of users

EDIT

3 Answers3

Linked