0

I have uploaded around 1TB of data on elasticsearch DB. For searching I tried following ways -

  1. "from+size" that has default value of index.max_result_window as 10000, but I wanted to search from 100000, hence I set index.max_result_window to 100000. Then searched from 100000 and size=10, but it causes heap size full.

  2. Scroll API - For keeping older segments alive utilizes more file handles. hence it again consumes the memory configured in nodes.

  3. search_after - I tried sorting documents on basis of _uid, but it gives me follwoing error -

-

{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[fielddata] Data too large, data for [_uid] would be    [13960098635/13gb], which is larger than the limit of [12027297792/11.2gb]",
        "bytes_wanted": 13960098635,
        "bytes_limit": 12027297792 
    }
  }
},

What can be done to resolve this error and also which is the most efficient way to search a large chunk of data through pagination?

Community
  • 1
  • 1
H.T
  • 1

1 Answers1

0

You're hitting a circuit breaker, because of the fielddata size. It is larger than the allotted part of the heap.

See the Elasticsearch documentation over here: https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker

Depending on your search requirements you can consider increasing the heap size, you could change the circuit breaker limit so it doesn't fire in your scenario. Probably the best way around this is to limit the fielddata cache size.

You can place an upper limit (relative or absolute) on the fielddata by adding this setting to the config/elasticsearch.yml file:

indices.fielddata.cache.size:  20% 

For details, see: https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#fielddata-size

And this existing answer: FIELDDATA Data is too large

Jaap
  • 641
  • 12
  • 19