1

Let's say I have 3 documents, each of them only contains one field (but let's imagine that there are more, and we need to search through all fields).

  1. Field value is "first second"
  2. Field value is "second first"
  3. Field value is "first second third"

Here is a script that can be used to create these 3 documents:

# drop the index completely, use with care!
curl -iX DELETE 'http://localhost:9200/test'

curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/one' -d '{"name":"first second"}'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/two' -d '{"name":"second first"}'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/three' -d '{"name":"first second third"}'

I need to find the only document (document 1) that has exactly "first second" text in one of its fields.

Here is what I tried.

A. Plain search:

curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
  "query": {
    "query_string": {
      "query": "first second"
    }
  }
}'

returns all 3 documents

B. Quoting

curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
  "query": {
    "query_string": {
      "query": "\"first second\""
    }
  }
}'

gives 2 documents: 1 and 3, because both contain 'first second'.

Here https://stackoverflow.com/a/28024714/7637120 they suggest to use 'keyword' analyzer to analyze the fields when indexing, but I would like to avoid any customizations to the mapping.

Is it possible to avoid them and still only find document 1?

hem
  • 1,012
  • 6
  • 11
Roman Puchkovskiy
  • 11,415
  • 5
  • 36
  • 72

2 Answers2

1

Yes, you can do that by declaring name mapping type as keyword. The key to solve your problem is just simple -- declare name mapping type:keyword and off you go

to demonstrate it, I have done these

1) created mapping with `keyword` for `name` field`
2) indexed the three documents
3) searched with a `match` query

mappings

PUT so_test16
{
  "mappings": {
    "_doc":{
      "properties":{
        "name": {
          "type": "keyword"

        }
      }
    }
  }
}

Indexing the documents

POST /so_test16/_doc
{
    "id": 1,
    "name": "first second"
}
POST /so_test16/_doc
{
    "id": 2,
    "name": "second first"
}

POST /so_test16/_doc
{
    "id": 3,
    "name": "first second third"
}

The query

GET /so_test16/_search
{
  "query": {
    "match": {"name": "first second"}
  }
}

and the result

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "so_test16",
        "_type" : "_doc",
        "_id" : "m1KXx2sB4TH56W1hdTF9",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 1,
          "name" : "first second"
        }
      }
    ]
  }
}

Adding second solution ( if the name is not a keyword type but a text type. Only thing here is fielddata:true also needed to be added for name field)

Mappings

PUT so_test18
{

    "mappings" : {
      "_doc" : {
        "properties" : {
          "id" : {
            "type" : "long"
          },
          "name" : {
            "type" : "text",
            "fielddata": true
          }
        }
      }

  }
}

and the search query

GET /so_test18/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_phrase": {"name": "first second"}}
      ],
      "filter": {

        "script": {
          "script": {
            "lang": "painless",
            "source": "doc['name'].values.length == 2"
          }
        }

      }
    }

  }
}

and the response

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.3971361,
    "hits" : [
      {
        "_index" : "so_test18",
        "_type" : "_doc",
        "_id" : "o1JryGsB4TH56W1hhzGT",
        "_score" : 0.3971361,
        "_source" : {
          "id" : 1,
          "name" : "first second"
        }
      }
    ]
  }
}
JBone
  • 1,724
  • 3
  • 20
  • 32
  • Thank you @JBone. But what if I do not control the mappings, and I don't know the exact fields that will exist in the documents? Is there a way to search using 'exact match' in such a case? – Roman Puchkovskiy Jul 06 '19 at 17:38
  • seems like without knowing `mapping` it will be difficult I guess, but if you want to try the second solution in my answer, please try. The thing here is in the `mappings` for `name` field, I needed to add `fielddata=true` for the script to work. Who knows your `mappings` might have consider `fielddata=true` already so you can try running the second solution. Also show us the `mappings` by `your_url/index/_mappings` – JBone Jul 06 '19 at 17:58
  • Actually, I'm interested in finding a way to do such a search without any manipulations with mappings. I do not have any known mappings yet, it's like a R&D task, so there is nothing to show yet, alas. – Roman Puchkovskiy Jul 08 '19 at 18:39
  • great. In new version of Elastic (version 7.2) they added `match_bool_prefix ` that might fix your problem (by having analyzer at query level unlike mapping level) we can try this out. I have 6.7 version on my machine so I need to get this new Elasticsearch. But do go over this link to understand what I am talking. https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-bool-prefix-query.html – JBone Jul 08 '19 at 19:20
  • Thank you, I will take a look – Roman Puchkovskiy Jul 09 '19 at 09:41
1

In Elasticsearch 7.1.0, it seems that you can use keyword analyzer even without creating a special mapping. At least I didn't, and the following query does what I need:

curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
  "query": {
    "query_string": {
      "query": "first second",
      "analyzer": "keyword"
    }
  }
}'
Roman Puchkovskiy
  • 11,415
  • 5
  • 36
  • 72