4

I am running Spark program on AWS EMR. I get the error mentioned in the end of this question when spark tries executing a SparkSQL query where i do a self join. The self join results in a cross join like result which is a huge result. Is this a memory issue? If so, is resorting to upgrade my cluster the only way to run it?Would appreciate quick answers because of the lack of time i have. Thanks in advance. Error message:

java.io.IOException: Failed to send RPC 7911398405091204769 to /10.31.240.189:34032: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
17/02/16 04:23:34 ERROR YarnScheduler: Lost executor 26 on ip-10-31-240-189.ec2.internal: Slave lost
17/02/16 04:23:34 WARN TaskSetManager: Lost task 72.0 in stage 12.0 (TID 1282, ip-10-31-240-189.ec2.internal): ExecutorLostFailure (executor 26 exited caused by one of the running tasks) Reason: Slave lost 
Avinash A
  • 51
  • 1
  • 5
  • Do you see any lost nodes in YARN UI? – franklinsijo Feb 16 '17 at 05:23
  • looking at the log info executors seem to have lost just before i get the above mentioned error. Here is the log: 17/02/16 05:23:23 ERROR TransportResponseHandler: Still have 22 requests outstanding when connection from /10.97.174.131:54174 is closed 17/02/16 05:23:23 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 6 at RPC address 10.153.246.85:39946, but got no response. Marking as slave lost. – Avinash A Feb 16 '17 at 05:56
  • If you can get the info on which slave is lost, you can check its nodemanager logs to know why it is dead. Could be due to outofmemory errors. Lost nodes info will be in yarn ui. – franklinsijo Feb 16 '17 at 06:19

0 Answers0