0

I've build a PHP based webapp with pagination. I've made both a Couchbase and a Postgres version. I had to abandon N1QL because it had terrible performance (maybe I'll make another question for that). So I migrated the project from N1QL to views. I noticed that, while at low page number (e.g. 1, 10, 50 with 48 records per page) performance was better than postgres (0.07s vs 0.11s), but at a high page number (e.g. 4000 -> 1.5 seconds and 16000 -> 5 seconds) performance is very bad. I use skip + limit for pagination with native CB library.

Any ideas?

PHP:

public static function findByPage($recordsPerPage, $page) {
        $query = CouchbaseViewQuery::from("dev_".static::COLLECTION_NAME, "get_".static::COLLECTION_NAME."")->reduce(false)->skip($recordsPerPage*($page-1))->limit($recordsPerPage)->custom(array("full_set"=> "true"));
       $data = DB::getDB()->query($query, null, true);
       //   var_dump($data);
        $objects = array();
        foreach($data["rows"] as $row) {
            $objects[] = static::find($row["key"]);
        }
        return $objects;
    }

One of the views (they are pretty much all the same):

function (doc, meta) {
  if(doc.collection == "green_area") {
    emit(doc._id, null);
   }
}
Mattia_98
  • 133
  • 1
  • 7
  • 1
    Would be happy to look into your N1QL pagination if you post a separate question with your queries and indexes and EXPLAIN and Couchbase version. – geraldss Jul 26 '16 at 16:06
  • 1
    Thank you, I've made a separate question for that if you want to take a look. I'll either do N1QL again if it works, wait for Couchbase CE 4.5.0 and use views + full text search or use PostgreSQL. – Mattia_98 Jul 27 '16 at 10:10

1 Answers1

0

This is a known limitation with views. The issue is that there is no way to know how far through the view index record 4000 is. When you request records 4000-4004, the view-engine doesn't have to generate just 5 records, it has to generate 4000 that it immediately discards and then hands you the next 5. Due to the nature of views and having to scatter-gather from multiple nodes to produce a single result this can be extremely expensive as you have observed. For this reason it is discouraged to use the 'skip' option

Instead it is recommended that you use the 'range' option. The way this works is to initially specify the range as open (ie. such that it would include all records), an example of this would be from \u00 to \u0fff (The full range of unicode characters) and to return e.g. 10 records. You would then remember what the 10th record was and specify that as the start of your range for the next page). For instance if your 10th record was 'beer' then you would specify the range from 'beer' to \u0fff. Now this would include beer as the 1st result, there are two ways to resolve this. The first is to request 11 results and ignore the first. The second way to resolve this would be to specify the range as 'beer\u00' to \u0fff which starts at the first possible record after 'beer'.

This Couchbase blog post goes into more details: http://blog.couchbase.com/pagination-couchbase

It's worth noting that N1QL will generally have the same problem of not being able to guess where the nth record will be in the index and will not necessarily be the answer to your problem.

Chippiewill
  • 333
  • 1
  • 11