1

I am trying to pull (and delete) all records from our database that don't have a URL with the word 'box' in it. This is the query I'm using:

{
   "selector": {
      "$not": {
         "url": {
            "$regex": ".*box.*"
         }
      }
   },
   "limit": 50
}

This query returns no records. But if I remove the $not, I get all records that do have the word 'box' in the url, but that's the opposite of what I want. Why do I get no results when adding the $not?

I have tried adding a simple base to the query like "_id":{"$gte":0} but that doesn't help.

xpqz
  • 3,617
  • 10
  • 16
David Powell
  • 537
  • 1
  • 4
  • 16

2 Answers2

1

from the Cloudant doc:

You can create more complex selector expressions by combining operators. However, for Cloudant NoSQL DB Query indexes of type json, you cannot use 'combination' or 'array logical' operators such as $regex as the basis of a query.

$not is a combination operator and therefore cannot be the basis of a query

i am able to get the following to work:

index

{
    "index": {
        "fields": ["url"]
    },
    "name" : "url-json-index",
    "type" : "json"
}

query

{
   "selector": {
      "url": {
         "$not": {
            "$regex": ".*box.*"
         }
      }
   },
   "limit": 50,
   "use_index": "url-json-index"
}

if you are still seeing problems, can you provide the output from _/explain and the indexes you have in place.

vabarbosa
  • 706
  • 1
  • 4
  • 9
  • Thank you for your quick response. I still cannot get it to work, however. I already had a JSON index for the url field, along with a full-text index and a JSON index on another field. Here is the output of explain (in multiple comments due to comment size limitations): – David Powell Jan 27 '18 at 00:18
  • { "dbname": "eeed3510-e6c7-4d57-85e4-fa34a53ba0ab-bluemix/david_test_custom_answers", "index": { "ddoc": "_design/32372935e14bed00cc6db4fc9efca0f1537d34a8", "name": "32372935e14bed00cc6db4fc9efca0f1537d34a8", "type": "text", "def": { "default_analyzer": "keyword", "default_field": {}, "selector": {}, "fields": [], "index_array_lengths": true } }, ... – David Powell Jan 27 '18 at 00:19
  • ... "selector": { "url": { "$not": { "$regex": ".*box.*" } } }, "opts": { "use_index": [], "bookmark": [], "limit": 50, "skip": 0, "sort": {}, "fields": "all_fields", "r": [ 49 ], "conflicts": false, "stale": false, "update": true, "stable": false, "execution_stats": false }, "limit": 50, "skip": 0, "fields": "all_fields", "query": "((($fieldnames:url_3a*) OR ($fieldnames:url.*)) AND NOT (($fieldnames:url_3astring)))", "sort": "relevance" } – David Powell Jan 27 '18 at 00:19
  • I am sorry to say I am not proficient in this and don't know how to read the explain output. – David Powell Jan 27 '18 at 00:20
  • @DavidPowell, from the explain the query is using index `"name": "32372935e14bed00cc6db4fc9efca0f1537d34a8"` which is a `text` index (and probably not the index you are not wanting). you can try to have it use the json index or some other specific index by including `"use_index": ""` see my edited example above. you can learn more about explain plans and how indexes are chosen here: https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html#explain-plans – vabarbosa Jan 29 '18 at 14:38
  • Everything you say makes perfect sense, but I still can get no joy from this query. I created a json index called url-json-index, exactly like you did above. Then I copypasted your query into the dashboard query field and clicked "Run Query". The dashboard immediately returned an error banner that said "Error running query. Reason: (unknown_error) Unknown Error: mango_idx :: {no_usable_index,no_index_matching_name}". I viewed the index under "manage indexes" on the dashboard, and it showed it but without the name. Perhaps the dashboard has a problem with naming indexes? – David Powell Jan 30 '18 at 01:35
1

The "no results" issue is due to a bug in text indexes that has been recently fixed. However, neither $not nor $regex operators are able to take advantage of global indexes so will always result in a full database or index scan.

The way to optimise this query is to use a partial index. A partial index filters documents at indexing time rather than at query time, creating an index over a subset of the database. You then need to tell the _find endpoint to explicitly use the partial index. For example, create an index which only includes documents not matching your regex:

POST /<db>/_index
{
  "index": {
    "partial_filter_selector": {
      "url": {
         "$not": {
            "$regex": ".*box.*"
         }
      }
    },
    "fields": ["type"]
  },
  "ddoc" : "url-not-box",
  "type" : "json"
}

then at query time:

{
   "selector": {
      "url": {
         "$not": {
            "$regex": ".*box.*"
         }
      }
   },
   "limit": 50,
   "use_index": "url-not-box"
}

You can see how many documents are scanned to fulfil the query in the Cloudant UI - the execution statistics are displayed in a popup underneath the query text area.

You may also find this This article about partial indexes helpful.

Will Holley
  • 1,745
  • 10
  • 11