1

In the design stage for an app that collects large amounts of data...

Ideally, I want it to be an offline-first app and was looking to Pouchdb/Counchdb - However, the data needs to be kept for years for legal reasons, and my concern is that this is going to consume too much local storage over time.

My thoughts were:

  1. handle sync between pouchdb and couchdb myself, allowing me to purge inactive documents from the local store without impacting the couchdb. This feels messy and probably a lot of work
  2. Build a local store using dexie.js and completely write the sync function. It also looks hard work, but may be less as I'm not trying to mess with a sync function
  3. Search harder :)

Conceptually, I guess I'm looking for a 'DB cache' - holding active json document versions and removing documents that have not been touched for X period. It might be that 'offline' mode is handled separate to the DB cache..

baradhili
  • 514
  • 1
  • 7
  • 27
  • Instead of CouchDB you could use MongoDB or Elasticsearch if you look for scalability & performance on the server side. On browser side you are usually limited to ~50MB local storage for offline persistence, so depending on your total data size you might hit the limit. You could define an scrollable API where you can request more data from the server to go back in time, and store the data locally in browser localStorage. – Peter Thoeny Dec 25 '22 at 23:34
  • Offline operation is really important as they need to access and enter data even if there is no connection to the internet - so I can't use a server-side DB as the primary store. Your API concept is basically what I am thinking of.. – baradhili Dec 26 '22 at 02:01
  • Found something interesting.. https://pouchdb.com/api.html#filtered-replication provides the ability to filter what gets replicated to the client. This may still need some work to flush out old replicated docs on the client side, but it looks like I might be able to filter based on a "last modified" field or such – baradhili Dec 26 '22 at 05:43
  • 1
    Thanks for sharing your request. It's a highly relevant requirement for offline applications keeping large amount of data. I'll prioritize it in Dexie Cloud. If you'd go the build-it yourself way, you could possibly get some inspiration from the source of dexie-cloud-addon, specifically how changes are tracked in a dbcore middleware. – David Fahlander Dec 26 '22 at 08:40

1 Answers1

0

Not sure yet if this is the correct answer..

  1. setup a filter on couchdb to screen out old documents (lets say we have a 'date_modified' field in the doc and we filter out any docs with date_modified older than one month)
  2. have a local routine on the client that deletes documents from the local pouchdb that are older than one month ( actually using the remove() method against the local pouchdb, not updating it with _deleted:true) - from https://pouchdb.com/2015/04/05/filtered-replication.html it appears removed documents don't sync.
  3. docs updated on the Pouchdb will replicate normally

there might be a race condition here for replication, we'll see

baradhili
  • 514
  • 1
  • 7
  • 27
  • purging documents locally can be quite a hassle. You could simply create a new local db, replicate some documents there and then delete the old db. You could also have something like a database per year on the server, only replicating the documents that you are still interested in while keeping the old databases just for the record. – LyteFM Dec 29 '22 at 17:22
  • This is wrong. Deleting documents on PouchDB does not really deleted them but only set _deleted=true. Atm there is no purge document feature for PouchDB. – pubkey Jan 18 '23 at 00:26
  • correct - delte only marks it "deleted".. remove however does actually remove the record - pouchdb warns against doing this because in filtered replication the removal may not be replicated - https://pouchdb.com/api.html#filtered-replication – baradhili Jan 18 '23 at 07:26