ElasticSearch: TooManyClauses exception when adding highlight

Question

My query_string query gives me a TooManyClauses exception. However, in my case I don't think that the exception is thrown due to the usual reason. Instead, it seems to be related to highlighting, because when I remove the highlight from the query, it works. This is my original query:

{
    "query" : {
        "query_string" : {
            "query" : "aluminium potassium +DOS_UUID:*",
            "default_field" : "fileTextContent.fileTextContentAnalyzed"
        }
    },
    "fields" : [ "attachmentType", "DOS_UUID", "ATT_UUID", "DOCUMENT_REFERENCE", "filename", "isCSR", "mime" ],
    "highlight" : {
        "fields" : {
            "fileTextContent.fileTextContentAnalyzed" : { }
        }
    }
}

and it gives me the TooManyClauses error:

{
   "error": "SearchPhaseExecutionException[Failed to execute phase [query_fetch], all shards failed; shardFailures {[02Z45jhrTCu7bSYy-XSW_g][markosindex][0]: FetchPhaseExecutionException[[markosindex][0]: query[filtered(fileTextContent.fileTextContentAnalyzed:aluminium fileTextContent.fileTextContentAnalyzed:potassium +DOS_UUID:*)->cache(_type:markostype)],from[0],size[10]: Fetch Failed [Failed to highlight field [fileTextContent.fileTextContentAnalyzed]]]; nested: TooManyClauses[maxClauseCount is set to 1024]; }]",
   "status": 500
}

This is the query without the highlight, which works:

{
    "query" : {
        "query_string" : {
            "query" : "aluminium potassium +DOS_UUID:*",
                    "default_field" : "fileTextContent.fileTextContentAnalyzed"
        }
    },
    "fields" : [ "attachmentType", "DOS_UUID", "ATT_UUID", "DOCUMENT_REFERENCE", "filename", "isCSR", "mime" ]
}

UPDATE 1:

This is the stacktrace from the ElasticSearch log file:

[2014-10-10 16:03:18,236][DEBUG][action.search.type       ] [Doop] [markosindex][0], node[02Z45jhrTCu7bSYy-XSW_g], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@14d7ab1e]
org.elasticsearch.search.fetch.FetchPhaseExecutionException: [markosindex][0]: query[filtered(fileTextContent.fileTextContentAnalyzed:aluminium fileTextContent.fileTextContentAnalyzed:potassium +DOS_UUID:*)->cache(_type:markostype)],from[0],size[10]: Fetch Failed [Failed to highlight field [fileTextContent.fileTextContentAnalyzed]]
    at org.elasticsearch.search.highlight.PlainHighlighter.highlight(PlainHighlighter.java:121)
    at org.elasticsearch.search.highlight.HighlightPhase.hitExecute(HighlightPhase.java:126)
    at org.elasticsearch.search.fetch.FetchPhase.execute(FetchPhase.java:211)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:340)
    at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:308)
    at org.elasticsearch.search.action.SearchServiceTransportAction$11.call(SearchServiceTransportAction.java:305)
    at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 1024
    at org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:72)
    at org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:149)
    at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:79)
    at org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:105)
    at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288)
    at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:217)
    at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:99)
    at org.elasticsearch.search.highlight.CustomQueryScorer$CustomWeightedSpanTermExtractor.extractUnknownQuery(CustomQueryScorer.java:89)
    at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:224)
    at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:474)
    at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:217)
    at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:186)
    at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:197)
    at org.elasticsearch.search.highlight.PlainHighlighter.highlight(PlainHighlighter.java:113)
    ... 9 more
[2014-10-10 16:03:18,237][DEBUG][action.search.type       ] [Doop] All shards failed for phase: [query_fetch]

Note: I am using ElasticSearch 1.2.1.

UPDATE 2:

This is my mapping:

{
   "markosindex": {
      "mappings": {
         "markostype": {
            "_id": {
               "path": "DOCUMENT_REFERENCE"
            },
            "properties": {
               "ATT_UUID": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "DOCUMENT_REFERENCE": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "DOS_UUID": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "attachmentType": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "fileTextContent": {
                  "type": "string",
                  "index": "no",
                  "fields": {
                     "fileTextContentAnalyzed": {
                        "type": "string"
                     }
                  }
               },
               "filename": {
                  "type": "string",
                  "index": "not_analyzed"
               },
               "isCSR": {
                  "type": "boolean"
               },
               "mime": {
                  "type": "string",
                  "index": "not_analyzed"
               }
            }
         }
      }
   }
}

Any idea? Thanks!

This is due to the wildcard query. But try setting the rewite parameter for your query string to "top_terms_20". Not sure if this would work but worth giving a shot. See this http://www.elasticsearch.org/guide/en/elasticsearch/reference, & http://lucidworks.com/blog/bringing-the-highlighter-back-to-wildcard-queries-in-solr-14/ — keety, Oct 10 '14 at 14:54
top_terms_N works (with N<=17), but I don't understand why. I have 82 documents, each with a distinct DOS_UUID. My understanding is that DOS_UUID:* is rewritten to 82 boolean clauses, and that I _would_ have a problem when there were >1024 distinct DOS_UUIDs. — Markos Fragkakis, Oct 10 '14 at 15:28
strange what is the typical value of DOS_UUID i.e is it alphanumeric and how is it analyzed. 82 distinct DOS_UUID need not result in 82 distinct terms on analysis while indexing.If DOS_UUID field is not analyzed then I understand it should be 82 clauses. — keety, Oct 10 '14 at 16:45
@MarkosFragkakis the mapping looks fine and I'm unable figure out why this would cause the rewrite to create more than 1024 clauses probably better of posting it as an issue on elastisearch https://github.com/elasticsearch/elasticsearch. However unrelated, for the above query since your highlighting only on fileTextContent.fileTextContentAnalyzed you should probably use highlight-query feature of highlighting and get rid of wildcard query term in there. — keety, Oct 14 '14 at 00:29

ElasticSearch: TooManyClauses exception when adding highlight

0 Answers0