0

Recently, a third-party company began providing data to us through CouchDB, and the responsibility of retrieving the data fell to me. At that time, I had read somewhere that it would be possible to retrieve changes using _changes with a date parameter. So, I saved the date and time in a file, thinking I could use this information to download the new data. However, I recently discovered that this approach is not feasible unless the documents themselves include a date. Is there any way, perhaps using the rev field or another method, to obtain a lastseq based on that date or that data?

The solution I found was to download all the files again using _changes and then perform a diff between the old and new data. This way, I could capture the differences and start using the new lastseq from the latest generated data. But, the dataset is quite substantial, around 260 GB or more.

1 Answers1

0

Do the _changes plan you have, but without include_docs so you just get the keys and _revs. Then, from your diff you want to get any key whose _rev has advanced from the data you already downloaded. Then you can use the POST /{db}/_all_docs with the include_docs and pass in just the keys of the docs from your diff (ie. the ones you know have changed).

smathy
  • 26,283
  • 5
  • 48
  • 68