1

I'm working on a project that requires me to search a CouchDB database for an approximate match and then sort the results. I've decided to use the Mango query system because I don't know ahead of time what field and sort combo will be used. The approximate matching part of this hasn't been an issue, the index I have created performs very well. However, when I sort the results the entire thing slows down even though from the _explain I can see it is using my index.

I'm being explicit as possible in these queries because I find that it helps CouchDB to automatically find the right index.

This is an example of a plain query I make with no problems:

{
   "selector": {
      "$and": [
         {
            "arr_one.0": {
               "$gte": "findOne"
            }
         },
         {
            "arr_one.0": {
               "$lt": "findOne\ufff0"
            }
         },
         {
            "arr_one.1": {
               "$gte": "findTwo"
            }
         },
         {
            "arr_one.1": {
               "$lt": "findTwo\ufff0"
            }
         }
      ]
   },
   "fields": ["_id"],
   "limit": 25
}

where arr_one is the array I am looking through and I'm trying to find a string starting with findOne on the first element and a string starting with findTwo on the second element.

The relevant parts of my index look like this:

"fields": [
  "arr_one.0",
  "arr_one.0",
  "arr_one.1",
  "arr_one.1"
]

This query also works very fast and finds the same results as above, but does not sort the results correctly:

{
   "selector": {
      "$and": [
         {
            "arr_one.0": {
               "$gte": "findOne"
            }
         },
         {
            "arr_one.0": {
               "$lt": "findOne\ufff0￰"
            }
         },
         {
            "arr_one.1": {
               "$gte": "findOne"
            }
         },
         {
            "arr_one.1": {
               "$lt": "findOne\ufff0￰"
            }
         },
         {
            "sort": {
               "$gt": null
            }
         }
      ]
   },
   "sort": [
      {
         "arr_one.0": "asc"
      },
      {
         "arr_one.1": "asc"
      },
      {
         "sort": "asc"
      }
   ],
   "fields": ["_id"],
   "limit": 25
}

With the index:

"fields": [
  "arr_one.0",
  "arr_one.0",
  "arr_one.1",
  "arr_one.1"
  "sort"
]

Now, this is the problem query and index which works in searching and sorting but takes a long time to complete:

{
   "selector": {
      "$and": [
         {
            "sort": {
               "$gt": null
            }
         },
         {
            "arr_one.0": {
               "$gte": "findOne"
            }
         },
         {
            "arr_one.0": {
               "$lt": "findOne\ufff0￰"
            }
         },
         {
            "arr_one.1": {
               "$gte": "findOne"
            }
         },
         {
            "arr_one.1": {
               "$lt": "findOne\ufff0￰"
            }
         }
      ]
   },
   "sort": [
      {
         "sort": "asc"
      },
      {
         "arr_one.0": "asc"
      },
      {
         "arr_one.1": "asc"
      }
   ],
   "fields": ["_id"],
   "limit": 25
}

With the index:

"fields": [
  "sort,
  "arr_one.0",
  "arr_one.0",
  "arr_one.1",
  "arr_one.1"
]

Any help trying to figure out how to optimize this would be appreciated. I'm open to any suggestions.

EDIT:

I've simplified the problem and am still getting the same issue. Instead of using an array, I'm only trying it with a single value and a single range. Still getting the same slow query even with an index.

{
   "selector": {
      "$and": [
         {
            "sort": {
               "$gt": null
            }
         },
         {
            "val": {
               "$gte": "findOne"
            }
         },
         {
            "val": {
               "$lt": "findOne\ufff0"
            }
         }
      ]
   },
   "sort": [
      {
         "sort": "asc"
      },
      {
         "val": "asc"
      }
   ],
   "limit": 25
}

With index:

"fields": [
  "sort",
  "val",
  "val"
]
  • I think your index is actually using *array keys* or *compound keys*, therefore it might be helpful to know a few [points about *array keys*](https://blog.couchbase.com/understanding-grouplevel-view-queries-compound-keys/). Also take a look at [this question and answer](https://stackoverflow.com/a/49647616/3405291) about *array keys*. – Megidd Apr 18 '18 at 02:00
  • Thanks, I'm not using the same type of keys as they are in the article though. I'm using the dot syntax in order to only get 1 element of the key at a time so I don't have to deal with those issues. – Raymond Schade Apr 18 '18 at 15:51
  • Can you provide some sample documents, so that your problem can be reproduced? – Megidd Apr 21 '18 at 03:11
  • @user3405291 In order to for this problem to be visible there needs to be a large amount of documents in the database. So I wrote a script that adds 5200 documents. 5000 aren't matches and 200 are matches for "a", 150 are for "aa" and so on. [script](https://pastebin.com/RsqhSDTq) This just requires the CouchDB module for python. [library](https://github.com/djc/couchdb-python) – Raymond Schade Apr 21 '18 at 06:20
  • In that last comment I made a small error, I put str() around the numbers to sort even though they weren't padded. This shouldn't affect the result time, just the order will be in terms of strings and not normal integer ordering. – Raymond Schade Apr 21 '18 at 06:29
  • Thanks. I'm going to try to reproduce this problem. I'll start some time later in the day. I'll let you know if I ran into any issue. – Megidd Apr 21 '18 at 07:06
  • When running the Python script with `$ python2.7 script.py`, I get this error: `AttributeError: 'module' object has no attribute 'choices'`. Can you take a look at the script? I couldn't debug it so far. – Megidd Apr 22 '18 at 13:18
  • @user3405291 the script was targeted towards Python 3.6 – Raymond Schade Apr 22 '18 at 20:02
  • I'm not sure, looks like the `couchdb-python` library is only installed for Python 2.7. When I try to run the script with other versions of Python, I receive: `ImportError: No module named 'couchdb'`. However, I can check with `$ pip show couchdb` that it is installed in `Location: /home/me/.local/lib/python2.7/site-packages` – Megidd Apr 23 '18 at 04:29
  • @user3405291 It says on their website it's targeted towards Python 3.4 and later and Python 2.7 and later. You could modify the script to make it work on Python 2.7 fairly easily, I just don't know how you would generate random strings. – Raymond Schade Apr 23 '18 at 13:09
  • Thanks. I think I'm going to work a bit on the script to see if I can make it work. – Megidd Apr 23 '18 at 14:38

0 Answers0