2

I faced a weird with elasticsearch python drivers and would like if someone can explain it to me! The below code works directly from cURL but doesn't work with python-requests or elasticsearch-py, strangely, it works when I switch to pyelasticsearch library! The details are:

I have a type called MY_TYPE that has a nested object MY_NESTED_FIELD and a child document MY_CHILD_TYPE. I'm trying to do term facet aggregation on the nested attributes based on filters applied to the MY_TYPE and MY_CHILD_TYPE types. The query looks like

{
  "query": {
    "filtered": {
      "filter": {
        "has_child": {
          "query": {
            "range": {
              "CHILD_FIELD": {
                "gte": 0.5
              }
            }
          },
          "type": "MY_CHILD_TYPE"
        }
      }
    }
  },    

  "aggs": {
    "aggregation_results": {
      "aggs": {
        "boards": {
          "terms": {
            "field": "MY_NESTED_FIELD.KEY",
            "size": 100
          },
          "aggs": {
            "MY_RANGES": {
              "range": {
                "ranges": [
                  {
                    "to": 0.5,
                    "from": 0
                  },
                  {
                    "to": 0.8
                    "from": 0.5
                  }
                ],
                "field": "MY_NESTED_FIELD_PATH.VALUE"
              }
            }
          }
        }
      },
      "nested": {
        "path": "MY_NESTED_FIELD_PATH"
      }
    }
  }
}

When I run this query against elasticsearch directly (using cURL or head plugin) it filters the parent and returns aggregations based on correct results. However, when I try it from the python script, it runs successfully but returns wrong data (it returns facets from all the documents without applying the filter)

I have tried:

  • cURL: Works!
  • ElasticSearch's HEAD plugin: Works!
  • python-requests version 2.8.1: Did not work!
  • elasticsearch-py api versions 1.4.0 and 2.1.0: Did not work!
  • pyelasticsearch version 1.4: Works!

The code snippets for elasticsearch-py is:

from elasticsearch import Elasticsearch
es   = Elasticsearch('HOST:PORT')
data = es.search(index='INDEX_NAME', doc_type='MY_TYPE', body=payload, q='*:*', size=0)

When using python-requests, the code was:

import requests
url = 'http://ES_HOST:ES_PORT/ES_INDEX/ES_TYPE/_search'
params = {'size':0, 'q':'*:*'}
data   = requests.post(url, params=params, data=json.dumps(payload)).json()

My elastic search version is:

{
  "version": {
    "number": "1.4.4",
    "build_hash": "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
    "build_timestamp": "2015-02-19T13:05:36Z",
    "build_snapshot": false,
    "lucene_version": "4.10.3"
  }
}

So my questions are:

  1. Is this the best way to write this query?
  2. Is there an explanation for why elasticsearch-py is acting strangely?
  3. Is there a fix for this on elasticsearch-py?
mjalajel
  • 2,171
  • 21
  • 27
  • would you please include a code snippet using "python-requests"? – Anas Al Hamdan Oct 22 '15 at 12:23
  • @AnasMe I added a the sample code using python-requests – mjalajel Oct 22 '15 at 21:15
  • Why do you have `q=*:*` in your parameters? – fabianvf Nov 16 '15 at 17:22
  • @fabianvf sometimes I give the client the ability to add additional filters with the lucene syntax (in addition to standard DSL syntax) I noticed in pyelasticsearch, I could use either the 'q=QUERY' or the DSL syntax (but not both). Do you think it's related to my problem? – mjalajel Nov 17 '15 at 11:05
  • I would suggest trying without it. I'm not sure what the interaction between your filtered query and the query string match all query would be. It may make no difference. I would suggest trying to integrate it into your larger query using the DSL though, if it's not too difficult. – fabianvf Nov 23 '15 at 19:35

0 Answers0