5

I am able to run a pyspark in yarn client mode in a laptop and I am trying to setup it up in another laptop. However, this time I can't get it running.

When I try to start pyspark in yarn client mode, it gives me the following error. I am using dynamic resource allocation, have set SPARK_EXECUTOR_MEMORY to be less than yarn container memory. I am using hadoop 2.6.4, spark 1.6.1, ubuntu 15.10

Is it possible that the error is due to network issues?

16/06/12 01:49:34 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0)
In [1]: 16/06/12 01:49:34 INFO cluster.YarnClientSchedulerBackend: Disabling executor 1.
16/06/12 01:49:34 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
16/06/12 01:49:34 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, 192.168.2.16, 37900)
16/06/12 01:49:34 ERROR client.TransportClient: Failed to send RPC 9123554941984942265 to 192.168.2.16/192.168.2.16:47630: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
16/06/12 01:49:34 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor
16/06/12 01:49:34 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 1 at RPC address 192.168.2.16:47640, but got no response. Marking as slave     lost.
java.io.IOException: Failed to send RPC 9123554941984942265 to 192.168.2.16/192.168.2.16:47630: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
    at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
    at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
    at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
    at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
    at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
16/06/12 01:49:34 ERROR cluster.YarnScheduler: Lost executor 1 on 192.168.2.16: Slave lost
16/06/12 01:49:34 INFO cluster.YarnClientSchedulerBackend: Disabling executor 2.
16/06/12 01:49:34 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 1)
16/06/12 01:49:34 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
16/06/12 01:49:34 ERROR client.TransportClient: Failed to send RPC 8690255566269835148 to 192.168.2.16/192.168.2.16:47630: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
16/06/12 01:49:34 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, 192.168.2.16, 41124)
16/06/12 01:49:34 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor
16/06/12 01:49:34 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 2 at RPC address 192.168.2.16:47644, but got no response. Marking as slave     lost.
java.io.IOException: Failed to send RPC 8690255566269835148 to 192.168.2.16/192.168.2.16:47630: java.nio.channels.ClosedChannelException
    at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
    at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
    at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
    at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
    at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
    at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
    at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
    at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
    at java.lang.Thread.run(Thread.java:745)
Michael
  • 1,398
  • 5
  • 24
  • 40
  • isnt it this(http://stackoverflow.com/questions/29781489/apache-spark-network-errors-between-executors) kind of ? – Ram Ghadiyaram Jun 11 '16 at 18:32
  • @RamPrasadG Thank you for the link. I can run spark now and the weird thing is I didn't do anything. I restarted ubuntu a couple of times and it didn't work. I gave up, went to sleep, wake up, power on my laptop and tried, and it works this time! – Michael Jun 12 '16 at 04:38
  • my last change was switching from jdk 8 to jdk 7 for hadoop. Maybe spark don't support jdk 8 – Michael Jun 12 '16 at 04:43
  • No.. Spark support JDK8, I think its your shuffle service issue. netty or nio.. many threads discuss about the shuffle service. when they change it to other they are getting through – Ram Ghadiyaram Jun 12 '16 at 05:22

0 Answers0