2

How do you call shingles in Python DSL?

This is a simple example that searches for a phrase in the "name" field and another one in the "surname" field.

import json
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search, Q

def make_dsl_query(fields):
    """
    Construct a query
    """
    es_client = Elasticsearch()
    my_query = Search(using=es_client, index="my_index", doc_type="my_type")

    if fields['name'] and fields['surname']:
        my_query = my_query.query(Q('bool', should=
                   [Q("match", name=fields['name']),
                    Q("match", surname=fields['surname'])]))
    return my_query


if __name__ == '__main__':

    my_query = make_dsl_query(fields={"name": "Ivan The Terrible", "surname": "Conqueror of the World"})
    response = my_query.execute()

    # print response
    for hit in response:
        print(hit.meta.score, hit.name, hit.surname)

1) Is it possible to use shingles? And how? I've tried many things and can't find anything in the documentation on it.

This would work in a normal Elasticsearch query, but apparently called in a different way in the Python DSL...

my_query = my_query.query(Q('bool', should=
                   [Q("match", name.shingles=fields['name']),
                    Q("match", surname.shingles=fields['surname'])]))

2) How do I pass fuzziness parameters to my match? Can't seem to find anything on it either. Ideally I would be able to do something like this:

my_query = my_query.query(Q('bool', should=
                   [Q("match", name=fields['name'], fuzziness="AUTO", max_expansions=10),
                    Q("match", surname=fields['surname'])]))
Ivan Bilan
  • 2,379
  • 5
  • 38
  • 58

2 Answers2

4

To use shingles you need to define them in your mappings, it's too late to try and use them in query time. At query time what you can do is use a match_phrase query.

my_query = my_query.query(Q('bool', should=
               [Q("match", name.shingles=fields['name']),
                Q("match", surname.shingles=fields['surname'])]))

This should work if written as:

 my_query = my_query.query(Q('bool', should=
               [Q("match", name__shingles=fields['name']),
                Q("match", surname__shingles=fields['surname'])]))

Assuming you have the shingles field defined on both name and surname fields.

Note that you can also use the | operator:

 my_query = Q("match", name__shingles=fields['name']) | Q("match", surname.shingles=fields['surname'])

instead of constructing the bool query yourself.

Hope this helps.

Honza Král
  • 2,982
  • 14
  • 11
  • Thanks a lot. How do I go about allowing fuzziness, though? For now, I just ended up making a dictionary with the needed query for fuzzy search and transforming it into Q object with Q({...}). Is there a better way to pass parameters to the query? The source code indicates that the query doesn't allow any additional parameters. – Ivan Bilan Feb 18 '17 at 10:20
  • 1
    of course, `Q("match", name__shingles={'query': fields['name'], 'fuzziness': 'AUTO'})` should work just fine - the kwargs are essentially just keys in the resulting json. – Honza Král Feb 18 '17 at 10:54
2

As of January, 2023: elasticsearch-dsl does support fuzzy matches, but it's just not very well documented.

For simple fuzzy matches:

Q('fuzzy', fieldName=matchString)

When you want to set a custom fuzziness:

Q({"fuzzy": {"yourFieldName": {"value": matchString, "fuzziness": fuzziness}}})

My understanding is that the fuzzy keyword is just a wrapper for a standard query, see https://github.com/elastic/elasticsearch-dsl-py/blob/master/elasticsearch_dsl/query.py#L362.

Source:

  1. https://github.com/elastic/elasticsearch-dsl-py/issues/1510 (solution courtesy of @leberknecht on github)