4

We are running elasticsearch-1.5.1 cluster with 6 nodes, Recent days I am facing the java.lang.OutOfMemoryError PermGen space issue in the cluster, This affect the node and the same will get down. I am restarting the particular node to get live.

We try to figure out this issue by giving heavy load to the cluster but unfortunatlly can't able to reproduce. But some how we are getting the same issue again and again in production.

Here some of yml file configuration

index.recovery.initial_shards: 1
index.query.bool.max_clause_count: 8192
index.mapping.attachment.indexed_chars: 500000
index.merge.scheduler.max_thread_count: 1
cluster.routing.allocation.node_concurrent_recoveries: 15
indices.recovery.max_bytes_per_sec: 50mb
indices.recovery.concurrent_streams: 5

Memory configuration

ES_HEAP_SIZE=10g
ES_JAVA_OPTS="-server -Des.max-open-files=true"
MAX_OPEN_FILES=65535
MAX_MAP_COUNT=262144

Update Question with below configuration

I suspect on the merge.policy.max_merged_segment related to this issue. We have 22 index in my cluster. the merge.policy.max_merged_segment for the indices is given below

  • 7 indices has 20gb
  • 3 indices has 10gb
  • 12 indices has 5gb

Update with process information

esuser xxxxx 1 28 Oct03 ? 1-02:20:40 /usr/java/default/bin/java -Xms10g -Xmx10g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -server -Des.max-open-files=true -Delasticsearch -Des.pidfile=/var/es/elasticsearch.pid -Des.path.home=/usr/es/elasticsearch -cp :/usr/es/elasticsearch/lib/elasticsearch-1.5.1.jar:/usr/es/elasticsearch/lib/:/usr/es/elasticsearch/lib/sigar/ -Des.default.path.home=/usr/es/elasticsearch -Des.default.path.logs=/es/es_logs -Des.default.path.data=/es/es_data -Des.default.path.work=/es/es_work -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.Elasticsearch


Below the stack trace i am getting from elasticsearch cluster while search. But event while index time also i am getting the same issue. As per my observation some search/ index operation increase the PermGen, if the upcoming operations try to use the PermGen space the issue comes.

[2015-10-03 06:45:05,262][WARN ][transport.netty          ] [es_f2_01] Message not fully read (response) for [19353573] handler org.elasticsearch.search.action.SearchServiceTransportAction$6@21a25e37, error [true], resetting
[2015-10-03 06:45:05,262][DEBUG][action.search.type       ] [es_f2_01] [product_index][4], node[GoUqK7csTpezN5_xoNWbeg], [R], s[INITIALIZING]: Failed to execute [org.elasticsearch.action.search.SearchRequest@5c2fe4c4] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: Failed to deserialize exception response from stream
Caused by: org.elasticsearch.transport.TransportSerializationException: Failed to deserialize exception response from stream
    at org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:176)
    at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:128)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
    at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: PermGen space

Can any help me to solve this issue. Thanks

Arun Prakash
  • 1,717
  • 17
  • 26
  • This is unusual for Elasticsearch. What ES version is this? Do you have the same settings in production and testing environments? What are the memory settings for ES? Run `ps -ef | grep elasticsearch` and provide the result. – Andrei Stefan Oct 07 '15 at 05:38
  • @AndreiStefan Added the process grep detail, I have already shared the es version, memory settings of ES in the Question. & I don't have the exact configuration in testing env comparing to production. – Arun Prakash Oct 07 '15 at 06:39
  • Is that the complete stack trace of the exception? Is there a root exception, as well? – Andrei Stefan Oct 07 '15 at 06:51
  • Yes this is the full stack what i have in the log... there is no cause... actual exception is PermGen – Arun Prakash Oct 07 '15 at 11:35
  • This shouldn't happen, even with those settings. You need to find the differences between prod and test and try to have the same things in both environments. – Andrei Stefan Oct 07 '15 at 11:37
  • Its quite hard to have the same configuration in local, but let me try to simulate it. – Arun Prakash Oct 07 '15 at 11:39
  • 1
    Not only config. Think about queries you run, how you access the cluster (client acces) etc. – Andrei Stefan Oct 07 '15 at 11:40

1 Answers1

1

The best solution is to use a "Java 8" JVM.

While you could modify the amount of heap your Java 7 JVM is using (by setting -XX:MaxPermSize=... if you are using an Oracle JVM), if you just upgrade the JVM to version 8, then you don't even need to tune the permgen size.

This is because in JVM 8, the permgen size shares the heap in a non-partitioned way, meaning that you will only run out of permgen space when you run out of heap.

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • Thanks for the info. Upgrade to JAVA8 in production is will fix the PermGen issue but, I can't able to upgrade now, and i can set PermGenSize explicitly. Even i want to know the root cause and provide a solution for that. – Arun Prakash Oct 07 '15 at 06:38
  • The root cause is that you are using the HotSpot JVM which optimizes heavily used code by compiling it into machine code for faster operation. This is desired behavior, not a bug. The problem is that after some period of time, it might find so much code to optimize that it runs out of space to store the optimized machine code. In older JVM releases, this code can only be stored in a pre-allocated area of memory within the JVM. In newer JVM releases, it is stored on the Heap. – Edwin Buck Oct 07 '15 at 15:39
  • Thanks Edwin... I agree with you Edwin, Its pure JVM memory utilization issue. The thing is PermGen is usually load the class files in to mem, I hope some mem leak point in my config/app/settings/process that might load the classes again in mem, and failed to unload the classes from mem. I am still trying to find out that mem leak position. – Arun Prakash Oct 08 '15 at 11:32
  • facing similar issue as mentioned in question. one fix suggested is to add -XX:+CMSClassUnloadingEnabled field along with -XX:+UseConcMarkSweepGC. But adding CMSclassUnloadingEnabled reduces performance is a concern provided. Is there any way not to bring down performance but to add CMSclassUnloadingEnabled entry. – DecKno Dec 28 '15 at 06:32
  • 1
    @ArivazhaganJeganathan It is 2015, and in a few days 2016. You should not be trying to fix permgen space issues by tweaking parameters, you should fix it by upgrading to Java 1.8. The "feature freeze" for Java 1.9 is done, and soon you will be two releases behind. Upgrade. Yes, it can be painful, but committing to feeling the pain early means you have a lot more time to make the best decisions. Committing to feeling that pain late means you will discover all the stuff you're going to discover anyway, but you won't have time to do anything about it. – Edwin Buck Dec 28 '15 at 20:34
  • @EdwinBuck , Thank you for your valuable input. Will try to insist on upgrade to fix this. – DecKno Dec 29 '15 at 04:53
  • @EdwinBuck , we have found some of the threads being live even after server stop. [link](http://stackoverflow.com/questions/34647901/elasticsearch-unclosed-client-live-threads-after-tomcat-shutdown-memory-usage). suggestions please – DecKno Jan 07 '16 at 06:47