3

I have some products that I'm indexing that go something like "99% chocolate". If I search for chocolate, it matches this particular item, but if I search for "99", it doesn't match. I came across this Using django haystack autocomplete with elasticsearch to search for digits/numbers? which had the same issue, but nobody has answered his question. Can someone please help?

Edit2: I'm sorry I neglected to include an important detail. The numeric search itself works, but the autocomplete doesn't work. I'm including the relevant lines:

#the relevant line in my index
    name_auto = indexes.EdgeNgramField(model_attr='name')

#the relevant line in my view
prodSqs = SearchQuerySet().models(Product).autocomplete(name_auto=request.GET.get('q', ''))

Edit: following are the results of running the analyser:

curl -XGET 'localhost:9200/haystack/_analyze?analyzer=standard&pretty' -d '99% chocolate'
{
  "tokens" : [ {
    "token" : "99",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "<NUM>",
    "position" : 1
  }, {
    "token" : "chocolate",
    "start_offset" : 4,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}
Community
  • 1
  • 1
Riz
  • 6,486
  • 19
  • 66
  • 106
  • What analyzer are you using for the fields? You can see how elasticsearch is tokenizing everything with analyze. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-analyze.html – Alain Collins Dec 04 '14 at 06:35
  • @AlainCollins sorry, i've updated the question to reflect the fact that the normal search works fine. However, it's the autocomplete doesn't match on numbers. – Riz Dec 12 '14 at 18:32

2 Answers2

3

finally found the answer here: ElasticSearch: EdgeNgrams and Numbers

Add the following classes and change the Engine under Haystack_connections in settings file to use CustomElasticsearchSearchEngine below instead of default haystack one:

class CustomElasticsearchBackend(ElasticsearchSearchBackend):
    """
    The default ElasticsearchSearchBackend settings don't tokenize strings of digits the same way as words, so they
    get lost: the lowercase tokenizer is the culprit. Switching to the standard tokenizer and doing the case-
    insensitivity in the filter seems to do the job.
    """
    def __init__(self, connection_alias, **connection_options):
        # see https://stackoverflow.com/questions/13636419/elasticsearch-edgengrams-and-numbers
        self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['tokenizer'] = 'standard'
        self.DEFAULT_SETTINGS['settings']['analysis']['analyzer']['edgengram_analyzer']['filter'].append('lowercase')
        super(CustomElasticsearchBackend, self).__init__(connection_alias, **connection_options)

class CustomElasticsearchSearchEngine(ElasticsearchSearchEngine):
    backend = CustomElasticsearchBackend
Community
  • 1
  • 1
Riz
  • 6,486
  • 19
  • 66
  • 106
0

Running you string 99% chocolate through the standard analyser gives the right results (99 is a term on its own), so if you're not using it currently, you should switch to it.

curl -XGET 'localhost:9200/myindex/_analyze?analyzer=standard&pretty' -d '99% chocolate'
{
  "tokens" : [ {
    "token" : "99",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "<NUM>",
    "position" : 1
  }, {
    "token" : "chocolate",
    "start_offset" : 4,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}
Olly Cruickshank
  • 6,120
  • 3
  • 33
  • 30
  • sorry, i've updated the question to reflect the fact that the normal search works fine. However, the autocomplete doesn't match on numbers. – Riz Dec 12 '14 at 18:33