17

I'm looking for a way to do exact array matches in elastic search. Let's say these are my documents:

{"id": 1, "categories" : ["c", "d"]}
{"id": 2, "categories" : ["b", "c", "d"]}
{"id": 3, "categories" : ["c", "d", "e"]}
{"id": 4, "categories" : ["d"]}
{"id": 5, "categories" : ["c", "d"]}

Is there a way to search for all document's that have exactly the categories "c" and "d" (documents 1 and 5), no more or less?

As a bonus: Searching for "one of these" categories should still be possible as well (for example you could search for "c" and get 1, 2, 3 and 5)

Any clever way to tackle this problem?

Pascal
  • 5,879
  • 2
  • 22
  • 34

2 Answers2

19

If you have a discrete, known set of categories, you could use a bool query:

"bool" : {
    "must" : {
        "terms" : { "categories" : ["c", "d"],
             minimum_should_match : 2
         }
    },
    "must_not" : {
        "terms" : { "categories" : ["a", "b", "e"],
             minimum_should_match : 1
         }
    }
}

Otherwise, Probably the easiest way to accomplish this, I think, is to store another field serving as a categories keyword.

{"id": 1, "categories" : ["c", "d"], "categorieskey" : "cd"}

Something like that. Then you could easily query with a term query for precisely the results you want, like:

term { "categorieskey" : "cd" }

And you could still search non-exclusively, as;

term { "categories" : "c" }

Querying for two categories that must both be present is easy enough, but then preventing any other potential categories from being present is a bit harder. You could do it, probably. You'dd probably want to write a query to find records with both, then apply a filter to it eliminating any records with categories other than the ones specified. It's not really a sort of search that Lucene is really designed to handle, to my knowledge.

Honestly I'm having a bit of trouble coming up with a good filter to use here. You might need a script filter, or you could filter the results after they have been retrieved.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
  • 1
    funny, that's exactly what i told him :) – phoet Oct 01 '12 at 18:57
  • 1
    This query won't run. `minimum_match` doesn't appear to be a valid parameter to a TermsFilter. – Conrad.Dean Aug 05 '14 at 14:34
  • @Conrad.Dean Who said anything about using a filter? – femtoRgon Aug 05 '14 at 15:52
  • @femtoRgon oh woops. i was way too zoomed in. the syntax is identical if that's wrapped with a `"filter":{...}` instead of just a `"query":{...}`. `minimum_should_match` is a valid parameter in a terms query http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-query.html – Conrad.Dean Aug 05 '14 at 19:13
  • Ah, you're right, probably should be `minimum_should_match`. Not sure whether that's a change in ElasticSearch, or a mistake, but certainly doesn't hurt to update. Thanks. – femtoRgon Aug 06 '14 at 05:21
  • `terms` does not support `minimum_should_match`. It needs to be inside the `bool` query. The above query will not do what the OP wants and should not be marked as the best answer. The correct answer is to use one `terms` per term (a single-value array), and use `minimum_should_match` with the number of terms. – Alexander Staubo Apr 01 '16 at 22:21
  • @AlexanderStaubo - Okay, that, I'm certain, is a [change in the API](https://www.elastic.co/guide/en/elasticsearch/reference/0.90/query-dsl-terms-query.html). This question was asked in 2012, which would have been around the time 0.90 was released. That change occurred in 2.0, 3 years *after* this question was asked and answered. – femtoRgon Apr 02 '16 at 16:15
1

I found a solution for our usage case that appears to work. It relies on two filters and the knowledge of how many categories we want to match against. We make use of a terms filter and a script filter to check the size of the array. In this example, marketBasketList is similar to your categories entry.

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "siteId": 4
          }
        },
        {
          "match": {
            "marketBasketList": {
              "query": [
                10,
                11
              ],
              "operator": "and"
            }
          }
        }
      ]
    },
    "boost": 1,
    "filter": {
      "and": {
        "filters": [
          {
            "script": {
              "script": "doc['marketBasketList'].values.length == 2"
            }
          },
          {
            "terms": {
              "marketBasketList": [
                10,
                11
              ],
              "execution": "and"
            }
          }
        ]
      }
    }
  }
}
Lucas Holt
  • 3,826
  • 1
  • 32
  • 41