0

I have added a document like this to my index

POST /analyzer3/books
{
  "title": "The other day I went with my mom to the pool and had a lot of fun"
}

And then I do queries like this

GET /analyzer3/_analyze
{
  "analyzer": "english",
  "text": "\"The * day I went with my * to the\""
}

And it successfully returns the previously added document.

My idea is to have quotes so that the query becomes exact, but also wildcards that can replace any word. Google has this exact functionality, where you can search queries like this, for instance "I'm * the university" and it will return page results that contain texts like I'm studying in the university right now, etc.

However I want to know if there's another way to do this.

My main concern is that this doesn't seem to work with other languages like Japanese and Chinese. I've tried with many analyzers and tokenizers to no avail.

Any answer is appreciated.

Chris Vilches
  • 986
  • 2
  • 10
  • 25

2 Answers2

0

Exact matches on the tokenized fields are not that straightforward. Better save your field as keyword if you have such requirements.

Additionally, keyword data type support wildcard query which can help you in your wildcard searches.

So just create a keyword type subfield. Then use the wildcard query on it.

Your search query will look something like below:

GET /_search
{
    "query": {
        "wildcard" : { 
            "title.keyword" :  "The * day I went with my * to the" 
         }
    }
}

In the above query, it is assumed that title field has a sub-field named keyword of data type keyword.

More on wildcard query can be found here.

If you still want to do exact searches on text data type, then read this

Aayush Anand
  • 1,104
  • 1
  • 10
  • 14
Nishant
  • 7,504
  • 1
  • 21
  • 34
  • This is missing a `*` at the end to work. Also it is slow and should probably only be used with care. – xeraa Dec 26 '18 at 01:23
0

Elasticsearch doesn't have Google like search out of the box, but you can build something similar.

Let's assume when someone quotes a search text what they want is a match phrase query. Basically remove the \" and search for the remaining string as a phrase.

PUT test/_doc/1
{
  "title": "The other day I went with my mom to the pool and had a lot of fun"
}

GET test/_search
{
  "query": {
    "match_phrase": {
      "title": "The other day I went with my mom to the pool and had a lot of fun"
    }
  }
}

For the * it's getting a little more interesting. You could just make multiple phrase searches out of this and combine them. Example:

GET test/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "title": "The"
          }
        },
        {
          "match_phrase": {
            "title": "day I went with my"
          }
        },
        {
          "match_phrase": {
            "title": "to the"
          }
        }
      ]
    }
  }
}

Or you could use slop in the phrase search. All the terms in your search query have to be there (unless they are being removed by the tokenizer or as stop words), but the matched phrase can have additional words in the phrase. Here we can replace each * with 1 other words, so a slop of 2 in total. If you would want more than 1 word in the place of each * you will need to pick a higher slop:

GET test/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "The * day I went with my * to the",
        "slop": 2
      }
    }
  }
}

Another alternative might be shingles, but this is a more advanced concept and I would start off with the basics for now.

xeraa
  • 10,456
  • 3
  • 33
  • 66