0

I recently upgraded ES from version 1.1.0 to 1.6. All the configs remained same but almost 5% of the calls started throwing EsRejectedExecutionException which never happened when I was running ES 1.1.0

The full stack trace is below:

org.elasticsearch.action.search.ReduceSearchPhaseException: Failed to execute phase [merge], [reduce]
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction$1.onFailure(TransportSearchQueryAndFetchAction.java:93)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.onRejection(AbstractRunnable.java:65)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:85)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.moveToSecondPhase(TransportSearchQueryAndFetchAction.java:78)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.innerMoveToSecondPhase(TransportSearchTypeAction.java:403)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:202)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onResult(TransportSearchTypeAction.java:178)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$1.onResult(TransportSearchTypeAction.java:175)
at org.elasticsearch.search.action.SearchServiceTransportAction$12.handleResponse(SearchServiceTransportAction.java:346)
at org.elasticsearch.search.action.SearchServiceTransportAction$12.handleResponse(SearchServiceTransportAction.java:337)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleResponse(MessageChannelHandler.java:163)
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:132)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 500) on org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction$1@4cd69494
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:79)

All configurations remained the same while upgrading, so I am wondering what might be causing the issue. The thread pool size seems to be full, so I am trying to test by increasing the thread pool and Queue_size. But I am not able to understand what changed from 1.1 to 1.6 that is causing the issue.

I read the release notes and didn't find anything related to this. I was searching for any default value changes for any config parameters, but I didn't find anything which seems related.

I found this issue which is very similar to my issue, but this was resolved in version 1.1.2, so this shouldn't be causing the error.

It will be a great help if someone could give me pointers on how to go about finding the issue.

Atri
  • 5,511
  • 5
  • 30
  • 40
  • How many nodes do you have and how many CPUs per node? – Val May 04 '16 at 04:09
  • @Val I have 2 clusters, 3 nodes each. There are 32 CPUs per node. With the same configuration I never received the exception on 1.1 but having the errors in 1.6 – Atri May 04 '16 at 05:21
  • I read http://stackoverflow.com/questions/27793530/esrejectedexecutionexception-in-elasticsearch-for-parallel-search, which will alleviate the issue but not solve it. I think its something to do with 2 cluster configuration. – Atri May 04 '16 at 05:23
  • For 1.6 the [search thread pool size](https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-threadpool.html#modules-threadpool) is smaller than for [1.3 (the documentation for 1.1 is not available anymore on the website)](https://www.elastic.co/guide/en/elasticsearch/reference/1.3/modules-threadpool.html#modules-threadpool). _3x # of available processors_ for 1.3 and _int((# of available_processors * 3) / 2) + 1_ for 1.6. And this might explain it. – Andrei Stefan May 04 '16 at 05:26
  • @AndreiStefan I have 32 CPUs and I have set the search thread pool size to 20. That shouldn't be an issue as it worked perfectly fine with 1.1 with this config – Atri May 04 '16 at 05:30
  • Why did you change the defaults? – Andrei Stefan May 04 '16 at 05:32
  • It wasn't managed by me previously, so I don't know why the defaults were changed initially. – Atri May 04 '16 at 05:39
  • I suggest leaving out the custom setting for threadpools, as the defaults are pretty good usually. Very rarely these should be changed. – Andrei Stefan May 04 '16 at 05:55
  • Thanks for your input @AndreiStefan. I will try it. – Atri May 04 '16 at 06:02

1 Answers1

0

I upgraded to 1.7.5 and everything worked perfectly fine. I guess there is some issue with ES 1.6.0 with the way my mappings and queries are designed.

Atri
  • 5,511
  • 5
  • 30
  • 40