elasticsearch: phrase search for two adjacent words in any order (analyzed)

Question

The problem is to do a phrase search for two adjacent words in any order with words analysis.

E.g. in Sphinx extended syntax terms the query string can be written as WordToBeAnalyzed1 NEAR/1 WordToBeAnalyzed2. Then both words are being analyzed and the search engine finds either "Word1 Word2" or "Word2 Word1", where both words can be in any form (e.g. "fox jumps", "jumping fox", "foxes jumped", and so on).

Reading the ES docs I could not express the same search in the ES query DSL.

When querying with match_phrase and slop I can query a phrase "WordToBeAnalyzed1 WordToBeAnalyzed2" with a "slop": 2 param to match same words in reverse order. But it will also match such undesirable variants as "Word1 SlopWord1 Word2" and "Word1 SlopWord1 SlopWord2 Word2".

I also tried to use span_near query with the in_order param, but

span queries are term-level queries, so they have no analysis phase

I would be glad if anyone can point me to a way to solve this problem.

I've never been able to figure this one out the 'right' way. We've used a couple of workarounds in the past to mimic this. One was to sort the tokens in the phrase in another field (like a pseudo analyzer) and apply the same sort at search time. Another was to store the tokens as an array and do a terms query. — coffeeaddict, Sep 04 '14 at 21:47
@coffeeaddict Thanks, but looks like I was not able to understand your workarounds properly, or we are trying to solve different problems. Sorting tokens in an indexed string, e.g. "word2 word4 word1 word3" will produce terms indexed in the next order: "word1 word2 word3 word4". Let's assume that one needs to query phrase `"word4 word1"` with any words order. Proposed analyzer changes the query to `"word1 word4"`, but the problem is that in the indexed text "word1" and "word4" have distance in 2 words, so the query will fail. How can you consider words order with the `term` query? — Sergii Golubev, Sep 05 '14 at 08:24
yes, you are right. i should've asked to clarify your requirements first :( i had thought that the type of phrase match you were trying to do involved equal number of tokens between indexed phrase and search phrase (just not in same order). You may still be able to do a terms query if you are doing phrase match in only one direction. if your search terms are shorter than the indexed terms, you can query the terms with minimum_should_match set to the number of tokens in the search term. This model will fail if the search term is longer than the indexed term — coffeeaddict, Sep 05 '14 at 17:07
@coffeeaddict Did not understand your proposal. You suggested to use a term query together with the phrase match? Can you provide a short illustration of a query you are offering? Thanks! — Sergii Golubev, Sep 08 '14 at 09:55

score 2 · Answer 1 · answered Sep 05 '14 at 01:40

2

What about running the query through an explicit request to the _analyze API first, then the span_near query?

answered Sep 05 '14 at 01:40

BenG

1,292
8
11

thanks for your workaround. If there is no possibility to do this in a single query request I will probably use analyzer explicitly – Sergii Golubev Sep 05 '14 at 07:59
There is a fixed set of queries in my application that are being generated from the data taken from relational DB. So I can analyze all strings only once and do `span_near` queries taking analyzed terms from the DB. – Sergii Golubev Sep 05 '14 at 08:07
Another workaround. Perhaps for performance reasons when queries are not known beforehand it would be better to query both variants: `"WordToBeAnalyzed1 WordToBeAnalyzed2"` and `"WordToBeAnalyzed2 WordToBeAnalyzed1"` in a single query (tests are needed to prove this). But not in my case (see previous comment). – Sergii Golubev Sep 05 '14 at 08:31

score 1 · Answer 2 · answered Sep 21 '18 at 10:10

1

This will work.

{
"query":{
    "bool":{
        "must":[
            {
                "query_string":{
                    "query":"*hello* *there*",
                    "fields":[
                        "subject"
                    ],
                    "default_operator":"and",
                }
            }]
      }
  }
}

answered Sep 21 '18 at 10:10

Chittiraju Yenni

11
1

1

Thank you for your contribution to StackOverflow. Your answer could be good. Nevertheless, some explanations would be nice, see [here](https://stackoverflow.com/help/how-to-answer). – colidyre Sep 21 '18 at 10:26

elasticsearch: phrase search for two adjacent words in any order (analyzed)

2 Answers2