1

I need the ability to query an ElasticSearch index to see if there are any documents that already have a specific value for the field shown below:

"name" : {
      "type" : "text",
      "fields" : {
        "raw" : {
          "type" : "keyword"
        }
      }
 }

I was initially going to do this using a normalizer, but i'm hoping to avoid having to make changes to the index itself. I then found the match_phrase query which does almost exactly what I need. The problem is that it will also return partial matches as long as they start off the same. For example - if I'm searching for the value this is a test it will return results for the following values:

  • this is a test 1
  • this is a test but i'm almost done now
  • this is a test again

In my situation I can do another check in code once the data is returned to see if it is in fact a case insensitive exact match, but I'm relatively new to ElasticSearch and I'm wondering if there is any way I could structure my original match_phrase query in such a way that it would not return the examples I posted above?

Abe Miessler
  • 82,532
  • 99
  • 305
  • 486

1 Answers1

1

For anyone that is interested I found a few different ways to do this, the first - do a match_phrase query and then have a script that checks the length:

GET definitions/_search
{
  "query": {
    "bool":{
      "must":{
        "match_phrase":{
          "name":{
             "query":"Test Name"
          }
        }
      },
      "filter": [
        {
          "script": {
            "script": {
              "source": "doc['name.raw'].value.length() == 9",
              "lang": "painless"
            }
          }
        }
      ]
    }
  }
}

Then I figured that if I could check the length in the script, maybe I could just do a case insensitive comparison:

GET definitions/_search
{
  "query": {
    "bool": { 
      "filter": [
        {
          "script": {
            "script": {
              "source": "doc['name.raw'].value.toLowerCase() == 'test name'",
              "lang": "painless"
            }
          }
        }
      ]
    }
  }
}

So those are options. In my case I was concerned about performance so we just bit the bullet and created a normalizer that allows for case insensitive comparisons, so these weren't even used. But I figured I should throw this here since I wasn't able to find these answers anywhere else.

Abe Miessler
  • 82,532
  • 99
  • 305
  • 486