1

I am trying to conduct an Elasticsearch query that searched a text field ("body") and returns items that match at least one of two multi-word phrases I provide (ie: "stack overflow" OR "the stackoverflow"). I would also like the query to only provide results that occur after a given timestamp, with the results ordered by time.

My current solution is below. I believe the MUST is working correctly (gte a timestamp), but the BOOL + SHOULD with two match_phrases is not correct. I am getting the following error:

Unexpected character ('{' (code 123)): was expecting double-quote to start field name

Which I think is because I have two match_phrases in there?

This is the ES mapping and the details of the ES API I am using details are here.

{"query":
  {"bool":
    {"should":
      [{"match_phrase":
         {"body":"a+phrase"}
       },
       {"match_phrase":
         {"body":"another+phrase"}
       }
      ]
    },
  {"bool":
    {"must":
      [{"range":
        {"created_at:
          {"gte":"thispage"}
        }
       }
      ]}
     }
    },"size":10000,
      "sort":"created_at"
}
BHudson
  • 687
  • 4
  • 11
  • 1
    Syntax is not correct the bool need to be inside the should. Can you try bool must : [ {range...}, bool:should:[...]] as you want that all documents be gte created_at. – Gabriel Jan 09 '20 at 00:38
  • Thanks for your comment! Is this what you mean? This q is not working: https://gab.pushshift.io/search/?source_content_type=application/json&source={"query":{"bool":{"must":[{"range":{"created_at:{"gte":"1534004694"}}},{"bool":{"should":[{"match_phrase":{"body":"a+phrase"}},{"match_phrase":{"body":"another+phrase"}}]}}]}},"size":10,"sort":"created_at"} – BHudson Jan 09 '20 at 03:40

1 Answers1

1

I think you were just missing a single " after created_at.

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "created_at": {
                            "gte": "1534004694"
                        }
                    }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "match_phrase": {
                                    "body": "a+phrase"
                                }
                            },
                            {
                                "match_phrase": {
                                    "body": "another+phrase"
                                }
                            }
                        ]
                    }
                }
            ]
        }
    },
    "size": 10,
    "sort": "created_at"
}

Also, you are allowed to have both must and should as properties of a bool object, so this is also worth trying.

{
    "query": {
        "bool": {
            "must": {
                "range": {
                    "created_at": {
                        "gte": "1534004694"
                    }
                }
            },
            "should": [
                {
                    "match_phrase": {
                        "body": "a+phrase"
                    }
                },
                {
                    "match_phrase": {
                        "body": "another+phrase"
                    }
                }
            ]
        }
    },
    "size": 10,
    "sort": "created_at"
}

On a side note, Postman or any JSON formatter/validator would really help in determining where the error is.

Jedidja
  • 16,610
  • 17
  • 73
  • 112
  • Thanks for the answer after all this time! This is helpful both in this context and more generally. I am not working on this anymore so can’t test but am accepting since it’s a general purpose solution to fixing these sort of issues. Thanks again. – BHudson Jan 18 '23 at 19:02
  • NP @BHudson .. never know when something will help someone out in the end :) – Jedidja Jan 19 '23 at 20:07