0

I have a requirement where I need to query docs by phone number. Users can enter characters such as parenthesis and dashes in the search query string and they should be ignored.So, I have created a custom analyzer that uses a char_filter which in its turn uses pattern_replace token filter to remove everything but digits with a regex. But It does not seem like elastic search is filtering out non-digits. Here is a sample of what I am trying to do:

1) Index creation

put my_test_index 
{
     "settings" : {
         "index": {
            "analysis": {
               "char_filter": {
                  "non_digit": {
                     "pattern": "\\D",
                     "type": "pattern_replace",
                     "replacement": ""
                  }
               },
               "analyzer": {
                  "no_digits_analyzer": {
                     "type": "custom",
                     "char_filter": [
                        "non_digit"
                     ],
                     "tokenizer": "keyword"
                  }
            }
        }
     }
   },
   "mappings" : {
       "doc_with_phone_prop" : {
           "properties": {
               "phone": {
                   "type": "text",
                   "analyzer": "no_digits_analyzer",
                   "search_analyzer": "no_digits_analyzer"
               }
           }
       }
   }
}

2) Inserting one doc

put my_test_index/doc_with_phone_prop/1
{
    "phone": "3035555555"
}

3) Querying without any parenthesis or dashes in the phone

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "query_string": {
                    "query": "3035555555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}

This returns one document correctly:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "my_test_index",
            "_type": "doc_with_phone_prop",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
               "phone": "3035555555"
            }
         }
      ]
   }
}

4) Querying with parenthesis does not return anything, But I was under the assumption that my no_digits_analyzer will remove from the search terms everything but digits.

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "query_string": {
                    "query": "\\(303\\)555-5555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}

What am I doing wrong here?

I am using ElasticSearch 5.3.

Thanks.

milagvoniduak
  • 3,214
  • 1
  • 18
  • 18

1 Answers1

0

Just needed to read some more documentation. Apparently, I was using a wrong way to query the index, query_string does not escape special characters. I needed to use multi_match with query parameter instead.

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html

Query below worked and char filter is applied

post my_test_index/doc_with_phone_prop/_search
{
    "query": {
        "bool": {
            "must": [
            {
                "multi_match": {
                    "query": "(303) 555- 5555",
                    "fields": ["phone"]
                }
            }]
        }
    }
}
milagvoniduak
  • 3,214
  • 1
  • 18
  • 18