2

We have couchbase DB running in production with ~4,300,000 documents in a bucket

We are trying to retrieve records in paginated way. Currently we are using offset based pagination Eg. To retrieve first 20 records we are using below query, and we are incrmenting offset value to get the subsequent records.

SELECT   startdatetime,
         id,
         Meta().id as m_id
FROM     test_document USE index (test_doc_with_order using gsi)
WHERE    customerId='343434375317c0523ce020d6'
AND      accountId='12233322'
AND      documentType='TESTING'
AND      documentStatus IN ["ACTIVE", "NEW", "OLD"]
ORDER BY startDateTime,
         meta().id limit 20 offset 0

But this approach increased our query execution time linearly when we are traversing through the offset (means getting next set of records). So we decided to move this keyset based paginated query in reference with to blog

Here after fetching the first page we are using below query to get the next set of records

SELECT   startdatetime,
         id,
         Meta().id as m_id
FROM     test_document USE index (test_doc_with_order using gsi)
WHERE    customerId='343434375317c0523ce020d6'
AND      accountId='12233322'
AND      documentType='TESTING'
AND      documentStatus IN ["ACTIVE", "NEW", "OLD"]
AND      startDateTime >= 'last_document_startTime-2023-04-16T20:12:00Z'
AND      meta().id > "last_document_meta_id"
ORDER BY startDateTime,
         meta().id limit 20

getting the last document meta().id and startDateTime and passing to above query.

But we are finding some records are missing while fetching the next set.

Index we have in our couchbase DB

CREATE INDEX `test_doc_with_order` ON `test_document`(`documentType`,`customerId`,`accountId`,`testDocumentId`,`documentStatus`,`startDateTime`, `Meta().id`) WHERE (`documentType` = "TESTING")}

enter image description here

1 Answers1

3

Index uses on similar like b-tree i.e. With in the First key, second key, ... are sorted.

Keyset pagination only works when you have all predicates have single equality and unique key has range.

The following approach is complex. Verify the correctness of results and use with caution.

Use Inner query and only do first non-equality and use index order. This way you are starting scan close to where you left off vs from begin.

Outer query apply other predicates eliminate further already seen. LIMIT will stop once it reached vs continue produce all values.

CREATE INDEX ix100 ON default(customerId,accountId,startDateTime, meta().id)
           WHERE documentType = "TESTING" AND documentStatus IN ["ACTIVE", "NEW", "OLD"];

SELECT d. startDateTime, d.m_id
FROM ( SELECT   startDateTime, META().id AS m_id
       FROM     default USE INDEX (ix100)
       WHERE    customerId = '343434375317c0523ce020d6'
                AND accountId = '12233322'
                AND documentType = 'TESTING'
                AND documentStatus IN ["ACTIVE", "NEW", "OLD"]
                AND startDateTime >= $last_startdate
       ORDER BY startDateTime, META().id
      ) AS d
WHERE d.startDateTime != $last_startdate OR d.m_id > $last_docid
LIMIT 20;

Start with $last_startdate your start date $last_docid ""

Next loop change to last row values and repeat until no more results

vsr
  • 7,149
  • 1
  • 11
  • 10