0

I'm attempting to understand why a field with a single underscore, mods_genre would behave differently from a field with 1+ underscores, e.g. mods__genre when using the python elasticsearch-dsl client.

Using ElasticSearch version 5.5.1, and python 3.5.

The following is the code I'm working with to select all documents where a field matches a value.

This example is searching an index, foo, with field names that have only single underscores, and returns results as expected (as I've confirmed this field is populated with this value):

# query against index with single underscores in field name
query = Search(using=es_handle, index='foo')
query = query.filter(Q('term', **{'%s.keyword' % 'mods_genre' : 'biography'}))
query_results = query.execute()

In [16]: query_results.hits.total
Out[16]: 6

However, with very similar code, but querying an index that has field names with multiple underscores in a row, bar, I'm getting zero results:

# query against index with multiple underscores in field name
query = Search(using=es_handle, index='bar')
query = query.filter(Q('term', **{'%s.keyword' % 'mods__genre' : 'biography'}))
query_results = query.execute()

In [16]: query_results.hits.total
Out[16]: 0

Any insight into why this might be the case? I understand that field names that begin with an underscore are reserved, but have not stumbled on any documentation that indicates underscores within the field -- specifically multiple ones in a row -- would be problematic.

ghukill
  • 1,136
  • 17
  • 42

1 Answers1

1

This is simply because elasticsearch-dsl-py replaces double underscores __ in field names by a dot .. This can be seen on lines 222-223 in utils.py. So basically, the second query is actually made on mods.genre.keyword, which is probably not what you expect.

More info on the context can be seen in issue #28, but basically they wanted to take a concept similar to what is done in Django CRM.

Val
  • 207,596
  • 13
  • 358
  • 360