36

In elasticsearch what is the max limit to specify the value in the number of values a match can be performed on? I read somewhere that it is 1024 but is also configurable. Is that true? And how does it affect the performance?

curl -XPOST 'localhost:9200/my_index/_search?pretty' -d '{
  "query": {
    "filtered": {
      "filter": {
        "not": {
          "ids": {
            "type": "my_type",
            "values": ["1", "2", "3"]
}}}}}}'

How many values can I specify in this array ? What is the limit? If it is configurable what is the performance impact on increasing the limit?

joel.wilson
  • 8,243
  • 5
  • 28
  • 48
Phoenix
  • 8,695
  • 16
  • 55
  • 88

4 Answers4

40

I don't think there is any limit set by Elaticsearch or Lucene explicitly. The limit you might hit, though, is the one set in place by the JDK.

To prove my statement above, I looked at the source code of Elasticsearch:

/**
 * The maximum size of array to allocate.
 * Some VMs reserve some header words in an array.
 * Attempts to allocate larger arrays may result in
 * OutOfMemoryError: Requested array size exceeds VM limit
 */
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;   

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 *
 * @param minCapacity the desired minimum capacity
 */
private void grow(int minCapacity) {
    ...
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    ...
}

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

And that number (Integer.MAX_VALUE - 8) is 2147483639. So, this would be the theoretical max size of that array.

I've tested locally in my ES instance an array of 150000 elements. And here comes the performance implications: of course, you would get a degrading performance the larger the array gets. In my simple test with 150k ids I got a 800 ms execution time. But, all depends on CPU, memory, load, datasize, data mapping etc etc. The best would be for you to actually test this.

UPDATED Dec. 2016: this answer applies for the Elasticsearch version in existence at the end of 2014, ie in the 1.x branch. The latest available at that time was 1.4.x.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • 3
    This is incorrect there is a limit of 1024 by default. – Cheruvian Dec 25 '16 at 19:35
  • @Cheruvian can you explain where you get this conclusion from? Bear in mind that in 2014 when the question has been answered the Elasticsearch version in existence was in the 1.4.x branch. – Andrei Stefan Dec 27 '16 at 09:06
  • As of version 2.x I get this exception when I try to request with a list of terms > 1024: nested: NotSerializableExceptionWrapper[too_many_clauses: maxClauseCount is set to 1024]; }{[NODEID][INDEX][1]: RemoteTransportException[[NODE_NAME][IP_ADDRESS][indices:data/read/search[phase/query]]]; nested: SearchParseException[failed to parse search source – Cheruvian Dec 27 '16 at 19:25
  • 4
    @Cheruvian `ids` has nothing to do with `maxClauseCount` (which is related to boolean statements in a `bool` query/filter). `ids` is not re-written as a bunch of `bool` statements. Most likely, your problem comes from other parts of your query, which are not related to `ids`. Your downvote and comment don't apply for this post. – Andrei Stefan Dec 29 '16 at 11:12
  • 1
    Can i overpass the 1024 values issue by writing more than one terms query? inside a should query i'll put 10 terms query of 1024 values each - Will this work or the limitation is on the whole query? – Lior Y Dec 30 '16 at 10:03
  • Fair, not sure why I thought this was talking about a terms query have you actually tried using an ids query filter on > 1024 ids? I have a feeling it will have the same limitation. @liory I confirmed you CAN chunk it up into multiple terms filters and join them with a bool. – Cheruvian Jan 03 '17 at 19:00
  • @Cheruvian yes, tested this with 150000 ids. I mentioned this in my answer ;-) – Andrei Stefan Jan 04 '17 at 04:18
  • 3
    @Cheruvian i also tested it in chunks, it was good for something like 1,200,000 terms (each should with 1,000 terms). after this limit, i got all_shards_failed exception – Lior Y Jan 04 '17 at 08:44
10

Yes! The number of values in fields is configurable. By default it is limited to 1024. You can configure it in the elasticsearch.yml file.

indices.query.bool.max_clause_count: 10000

Note: Increasing the limit will lead to high memory and CPU usage.

Refer to these links for more info:

https://groups.google.com/forum/#!topic/elasticsearch/LqywKHKWbeI

https://github.com/elasticsearch/elasticsearch/issues/482

http://elasticsearch-users.115913.n3.nabble.com/index-query-bool-max-clause-count-Setting-and-TermsQueryParser-td3050751.html

http://elasticsearch-users.115913.n3.nabble.com/Query-string-length-limit-td4054066.html

hoijui
  • 3,615
  • 2
  • 33
  • 41
BlackPOP
  • 5,657
  • 2
  • 33
  • 49
8

An index level limitation on the number of terms in a terms query will be introduced in ES 7.0.

The setting is index.max_terms_count with a default value of 65536.

Amir Hadadi
  • 421
  • 4
  • 13
0

From the docs for version 6.4:

Executing a Terms Query request with a lot of terms can be quite slow, as each additional term demands extra processing and memory. To safeguard against this, the maximum number of terms that can be used in a Terms Query both directly or through lookup has been limited to 65536. This default maximum can be changed for a particular index with the index setting index.max_terms_count.

Carasel
  • 2,740
  • 5
  • 32
  • 51