0

I am using Django Haystack with Elasticsearch. I have a string field called 'code' in this type of format:

76-010

I would like to be able to search

76-

And get as a result

76-111

76-110

76-210

...

and so on.

but I don't want to get these results:

11-760

11-076
...

I already have a custom elastic search backend but I am not sure how should i indexing it to get the desired behavior.

class ConfigurableElasticBackend(ElasticsearchSearchBackend):

    def __init__(self, connection_alias, **connection_options):
        # see http://stackoverflow.com/questions/13636419/elasticsearch-edgengrams-and-numbers
        self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['tokenizer'] = 'standard'
        self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['filter'].append('lowercase')
        super(ConfigurableElasticBackend, self).__init__(connection_alias, **connection_options)
Jiyda Moussa
  • 925
  • 2
  • 9
  • 26
  • 1
    Have you seen [this article](https://wellfire.co/blog/custom-haystack-elasticsearch-backend/) because what it shows is pretty much all you need. – Val Aug 27 '15 at 02:27

1 Answers1

0

The idea is to use an edgeNGram tokenizer in order to index every prefix of your code field. For instance, we would like 76-111 to be indexed as 7, 76, 76-, 76-1, 76-11 and 76-111. That way you will find 766-11 by searching for any of its prefixes.

Note that this article provides a full-fledge solution to your problem. The index settings for your case would look like this in Django code. You can then follow that article to wrap it up, but this should get you started.

class ConfigurableElasticBackend(ElasticsearchSearchBackend):

    DEFAULT_SETTINGS = {
      "settings": {
        "analysis": {
          "analyzer": {
            "edgengram_analyzer": {
              "tokenizer": "edgengram_tokenizer",
              "filter": [ "lowercase" ]
            }
          },
          "tokenizer": {
            "edgengram_tokenizer": {
              "type": "edgeNGram",
              "min_gram": "1",
              "max_gram": "25"
            }
          }
        }
      },
      "mappings": {
        "your_type": {
          "properties": {
            "code": {
              "type": "string",
              "analyzer": "edgengram_analyzer"
            }
          }
        }
      }
    }

    def __init__(self, connection_alias, **connection_options):
        super(ConfigurableElasticBackend, self).__init__(connection_alias, **connection_options)

        self.conn = pyelasticsearch.ElasticSearch(connection_options['URL'], timeout=self.timeout)
        self.index_name = connection_options['INDEX_NAME']

        # create the index with the above settings
        self.conn.create_index(self.index_name, self.DEFAULT_SETTINGS)
Val
  • 207,596
  • 13
  • 358
  • 360