There are 500,000 documents stored in a CouchDB database. Client app needs to retrieve all docs for processing into another system. Is there a recommended way for retrieving all? I understand there is paging support using "limit" and "skip" parameters. It looks like a call can be made to get total docs then use a loop to call CouchDB dynamically updating "limit" and "skip" values. Is there an alternative way for retrieving all?
1 Answers
Aside from replication I think not. Of course it really depends on specifics not given in the OP. 500k of 200b docs may not be a bandwidth issue but 500k of 100kb documents might be a consideration.
There a lot of ways to approach this and since there are a lot of details not given, all most can do is offer a generic approach which I will do here.
The essence is to use /{db}/_all_docs with a combination of start_key
, limit
and skip
.
The initial state should be
start_key = null
Because null is first in the CouchDB Views Collationlimit = ?
Arbitrary as it depends on average document size, bandwidth, processing power etc.skip = 0
One doesn't want to skip anything at the start
The general solution is to adjust start_key
and limit
according to the last response.
Do note that skip
can be very inefficient. In this solution skip
is either 0 or 1 which is quite OK.
Each successive state depends upon the prior response:
start_key = last rows doc key
Can't know what the next key is, right?skip = 1
So the response doesn't include the last response doc
In other words, a subsequent request is saying "Give me the next set of docs starting one past the last document key received".
Here's a nano based script that provides a skeleton upon which to throw meat. It is naïve as it suggests URL credentials and has no error handling for clarity.
const nano = require("nano")("http://{uid:pwd@address:port");
const db = nano.db.use("{your db name}");
const echo = (json) => console.log(JSON.stringify(json, undefined, 2));
const processRows = (rows) => {
echo(rows);
};
(async () => {
let start_key = null;
let limit = 2; // whatever
let skip = 0;
let response;
let more = false;
do {
if (response) {
// next query is based on the last query.
start_key = response.rows.pop().key;
skip = 1;
}
response = await db.list({ start_key, limit, skip });
processRows(response.rows);
more = response.rows.length === limit;
} while (more);
console.info("Procesing completed.");
})();
Final words, this will return _design_docs too - probably want to filter those away.
Update
I neglected to add the actual answer: The default is to return all rows as stated in CouchDB document section 1.5.4.4. Using Limits and Skipping Rows, so it's up to the caller.

- 4,883
- 2
- 21
- 33