1

I'm currently trying to use a filter in an existing ElasticSearch instance via the library elasticutils. I'm getting nowhere, unfortunately. I'm not sure if the problem is because I did something basic wrong or if there's a problem in the library (could well be, AFAICT).

I've got an index with a specific mapping, containing a field (say "A") of type string (no explicit analyzer given). That field always contains a list of strings.

I'd like to filter my documents by containing a given string in that field A, so I tried:

import elasticutils as eu
es = eu.S().es(urls=[ URL ]).indexes(INDEX).doctypes(DOCTYPE)
f = eu.F(A="text")
result = es.filter(f)

But that returns an empty result set. I also tried it using f = eu.F(A__in="text") but that resulted in a large error message, the most intriguing part of it being [terms] filter does not support [A].

I'm wondering if I have to configure my index differently, maybe I have to create a facet to be able to use filter? But I didn't find any hint on this in the documentation I read.

My reason for wanting to use filter is that they can be combined freely using and, or, and not. I also found some specs describing that query also can be boolean, but they typically refer to must, should, and must_not which aren't flexible enough for me I think. But I also found some specs which mentioned an operator flag for querys which can be set to and or or. Any info on that is welcome.

So, my questions now are:

  • Is it a configuration problem? Do facets have something to do with this?
  • I'd like to test whether this is a library bug by skipping the lib, so how can I perform this filtering action using just, say, curl? Or any other library (maybe pyes)?
  • Is a flexible combining (using and, or, not, and groupings of them) of several queries possible (i. e. without using filters at all)? How would I do that? (Preferably in elasticutils but other library syntaxes, e. g. pyes, or simple CURLs are welcome as well).
Alfe
  • 56,346
  • 20
  • 107
  • 159
  • 1
    Can I suggest that you take a look at the Sense plugin (https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo?hl=en) for Chrome? It's a great tool for working with your ES cluster rather than CURL. Also, I'd recommend starting with pyelasticsearch rather than something that seems to abstract too much away - at least to start with (http://pyelasticsearch.readthedocs.org/). :) – James Addison Jul 28 '13 at 04:55
  • i don't know this library very well - is there a way to see the JSON that it is using to query ES? – argentage Jul 29 '13 at 17:23

2 Answers2

3

airza hit the nail on the head with his answer in terms of the filter you're looking for, in CURL format. I suspect the issues you're seeing are largely due to using an abstraction module like elasticutils - it would be good to get familiar with the underlying ES querying protocol first. It will make understanding elasticutils easier. As in my comment above, I recommend installing 'Sense', a plugin for Google Chrome that let's you easily query your ES cluster: https://chrome.google.com/webstore/detail/sense/doinijnbnggojdlcjifpdckfokbbfpbo?hl=en.

Elasticsearch query filters are extremely flexible - and 'nestable'. You can quite easily nest an or filter inside of a bool must filter. Example:

{
    "query": {
        "filtered": {
           "query": {
               "match_all": {}
           },
           "filter": {
               "bool": {
                   "must": [
                       {
                           "or": [
                                 {"exists": {"field": "sessions"}},
                                 {"range": {"id": {"gte": 56000}}}
                           ]
                       },
                       {
                           "term": {"age_min": "13"}
                       }
                   ],
                   "should": [
                      {
                          "term": {"area": "1"}
                      }
                   ]
               }
           }
        }
    }
}

In this example, results must match one of the two must or filters and the age_min term filter, and items matching the area term filter in the should clause will rank higher than non-matching items.

James Addison
  • 3,086
  • 1
  • 17
  • 16
  • Thanks for the elaborate answer and the example combining filters and queries, that's quite helpful for me currently. I'm using direct curl now as well, though I'm not happy to lose all the nice abstractions the `elasticutils` library provides. Maybe I'll find out how to kick that lib so that it does what I want in the future but currently I stick to believing it's not up to the task. – Alfe Aug 05 '13 at 09:22
  • I moved away from abstractions (django-haystack, for example) in favour of something more direct like `pyelasticsearch` to achieve more flexibility. Abstractions are great, until they're not. :) – James Addison Aug 08 '13 at 17:04
1

The CURL request to solve this problem is pretty straightforward:

curl -XPOST URL/INDEX/_search? -d '{
  "filter": {
    "term": {
      "A": "val"
    }
  }
}'

There's no particular relationship here to facets (which are a type of search query used to get the size of various subsets of another query) but if the field A is not indexed you won't be able to search for it and find anything. HOWEVER, if this is the case, your ES query should just return any records (since when you query a non-indexed field you are essentially giving ES no particular filter instructions)

The query spit out by my attempt to perform an equivalent ES search using this library was this:

{'filter': {'term': {'language': 'EN'}

Which you can see is the same as the one you ran. What happened when you called result.all() ?

argentage
  • 2,758
  • 1
  • 19
  • 28