1

My elasticsearch documents have a field Name with entries like:

Samsung Galaxy S3
Samsung Galaxy Ace Duos 3
Samsung Galaxy Duos 3
Samsung Galaxy S2
Samsung Galaxy S (I9000)

On querying this field with the following query (notice the space between "s" and "3"):

{
  "query": {
    "match": {
      "Name": {
        "query": "galaxy s 3",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}

It returns "Samsung Galaxy Duos 3" as a relevant result, and not "Samsung Galaxy S3".

The pattern I notice for such a task is to disregard the space between any number and any single alphabetical character, and make the query. For example then, "I-phone 5s" should also be returned by "I-phone 5 s".

Is there a nice way to accomplish this?

Saeed Zhiany
  • 2,051
  • 9
  • 30
  • 41
Souri
  • 15
  • 3

1 Answers1

2

You need to change your analyser to break up the string on a change from text to number - using a regular expression would help (this is based on the camelcase analyser):

curl -XPUT 'localhost:9200/myindex/' -d '
     {
         "settings":{
             "analysis": {
                 "analyzer": {
                     "mynewanalyser":{
                         "type": "pattern",
                         "pattern":"([^\\p{L}\\d]+)|(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"
                     }
                 }
             }
         }
     }'

testing the new analyser with your string:

-XGET 'localhost:9200/myindex/_analyze?analyzer=mynewanalyser&pretty' -d 'Samsung Galaxy S3'
{
  "tokens" : [ {
    "token" : "samsung",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "galaxy",
    "start_offset" : 8,
    "end_offset" : 14,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "s",
    "start_offset" : 15,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "3",
    "start_offset" : 16,
    "end_offset" : 17,
    "type" : "word",
    "position" : 4
  } ]
}
Olly Cruickshank
  • 6,120
  • 3
  • 33
  • 30
  • I'm sorry I cannot upvote your answer because stackoverflow doesn't allow me. I believe this is a positive step towards what I'm trying to achieve. Is there a way I can tell elasticsearch to boost scores if a number and a text beside it is exactly matched (irrelevant if there is a space or not between them)? Thank you so much for the help! – Souri Jan 27 '15 at 08:33