2

In our project we are currently (still) using Apache Ignite 2.81. We are currently facing OOMs on server nodes when multiple clients are simultaneously fetching the content of a specific cache. So far, we thought the reason is that the data is stored only off-heap and therefore with each client-request a copy of the data is moved into the heap (-> Heap >= number_of_clients * size_of_cache). We expected to mitigate this by putting onHeapEnabled = 'True' for the given cache as according to our understanding only one copy of the data should then exist in the heap and it should therefore not explode anymore.

  1. Are our assumptions in general correct?
  2. Aren't the server nodes using some kind of byte-stream internally when responding the data to clients? In this case it would be even more surprising that with on-heap activated the heap still explodes.

We are aware that scaling the server nodes/providing more heap would be a solution here but we would be interested in finding a resource-saving one.

trincot
  • 317,000
  • 35
  • 244
  • 286
User0000
  • 21
  • 2
  • 1
    Create and check a memory dump, see for example https://plumbr.io/blog/memory-leaks/solving-outofmemoryerror-dump-is-not-a-waste. It could be that your guess of what the heap contains is wrong. A heap dump can help. When you are sure about the cause it is easier to find a solution. – ewramner Jan 19 '22 at 11:42

1 Answers1

0

The cause of the OOM most likely is because of Ignite's internal metrics & meta data which is per client that causes OOM when multiple clients frequently fetch data from caches (especially non-trivial sized data, since the metrics internally hold references to the data) and there is connectivity problems with these clients either because of slow clients (due to things like JVM pauses, etc) or because the server config/threads aren't enough to handle the clients.

Therefore, the onHeapEnabled = 'True' option is not going to address the OOM, if anything it will only make it worse.

Instead, I would suggest that you enable Near Cache for this specific cache that you mention along with configuring things like nearStartSize & nearEvictionPolicy on the client nodes. That will solve your issue.

Note that, near caches are fully transactional & also get updated or invalidated automatically whenever the data changes on the server nodes, as clearly mentioned in the docs.

Thanks

lmk
  • 654
  • 5
  • 21