I have a pretty complex query that is running in the search API (elastic python client search API) for very large amount of phrases with some other constriens
query = {
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{"terms": {"page_id": chunked_pages[entity]}},
{
"bool": {
"should": [
{"match_phrase": {"content": {"query": name, "slop": 6}}}
for name in chunked_names[entity]
]
}
},
]
}
}
for entity in chunked_names.keys()
],
"minimum_should_match": 1,
}
},
"highlight": {
"fields": {
"content": {}
},
"pre_tags": ["<em>"],
"post_tags": ["</em>"],
},
"from": from_param,
"size": results_per_request
}
response = es.search(index=index_name, body=query)
And for each retrieved document I would like to know what phrase has been found there (since there are thousands of potential phrases ). I tried using the highlight but I am getting outputs that suggest that the highlight feature is mixing between bool clauses, while the document is correct, the highlight terms are not related (breaking the page_ids constraints)
Any idea how to deal with it?