Why ElasticSearch is not able to search when special characters are available?

Question

I have an ElasticSearch index with below configuration:

{
  "my_ind": {
    "settings": {
      "index": {
        "mapping": {
          "total_fields": {
            "limit": "10000000"
          }
        },
        "number_of_shards": "3",
        "provided_name": "my_ind",
        "creation_date": "1539773409246",
        "analysis": {
          "analyzer": {
            "default": {
              "filter": [
                "lowercase"
              ],
              "type": "custom",
              "tokenizer": "whitespace"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "3wC7i-E_Q9mSDjnTN2gxrg",
        "version": {
          "created": "5061299"
        }
      }
    }
  }
}

I want to search below content with plain search:

DL-1234170386456

This contents are available in the below field:

DNumber

This filed has mapping like below:

{
  "DNumber": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  }
}

I am trying to implement it in JAVA language. I came across the ElasticSearch Analyzers and Tokenizers so I made use of "whitespace" tokenizer.

I am trying to search with below query:

{
  "query": {
    "multi_match": {
      "query": "DL-1234170386456",
      "fields": [
        "_all"
      ],
      "type": "best_fields",
      "operator": "OR",
      "analyzer": "default",
      "slop": 0,
      "prefix_length": 0,
      "max_expansions": 50,
      "lenient": false,
      "zero_terms_query": "NONE",
      "boost": 1
    }
  }
}

What wrong I am doing?

score 0 · Answer 1 · answered Nov 15 '18 at 12:26

After doing lot of research and Trial & Error, found out the answer!

Some basic but important points:

We need to specify Analyzers and Tokenizers while creating/indexing the index/data.
In specified string i.e. "DL-1234170386456", special character (i.e. "-") is available and ElasticSearch is using by default Standard Analyzer.
Standard Analyzer contains Standard Tokenizer which is based on the Unicode Text Segmentation algorithm.

Actual Problem:

ElasticSearch is separating the String ("DL-1234170386456") into two different parts like "DL" and "1234170386456".

Solution:

We need to specify Whitespace Analyzer which contains Whitespace Tokenizer.
It will split the word whenever space is encountered. So, String ("DL-1234170386456") will kept as it is by ElasticSearch and we are able to find it out.

Why ElasticSearch is not able to search when special characters are available?

1 Answers1