3

we had quite few issues Spark thrift server

from the log we can see that : Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149

please advice why this happens , and what is the solution for this?

Failed to send RPC 9053901149358924945 to /DATA NODE MACHINE:50149: java.nio.channels.ClosedChannelException
more spark-hive-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-master03.sys67.com.out


Spark Command: /usr/jdk64/jdk1.8.0_112/bin/java -Dhdp.version=2.6.0.3-8 -cp /usr/hdp/current/spark2-thriftserver/conf/:/usr/hdp/current/spark2-thriftserver/jars/*:/usr/hdp/c
urrent/hadoop-client/conf/ -Xmx10000m org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=15g --properties-file /usr/hdp/current/spark2-thriftserver/conf/spark-th
rift-sparkconf.conf --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift JDBC/ODBC Server --executor-cores 7 spark-internal
========================================
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
18/02/07 17:55:21 ERROR TransportClient: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(2,0,Map()) to AM was unsuccessful
java.io.IOException: Failed to send RPC 9053901149358924945 to /12.87.2.64:50149: java.nio.channels.ClosedChannelException
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:249)
        at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:233)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:514)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:488)
        at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
        at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:438)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.ClosedChannelException
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
18/02/07 17:55:21 ERROR SparkContext: Error initializing SparkContext.

we also try to capture some good point from this link - https://thebipalace.com/2017/08/23/spark-error-failed-to-send-rpc-to-datanode/

but this is a new ambari cluster and we don't think this article fit for this particular issue ( no spark jobs are running now on our ambari cluster )

enodmilvado
  • 443
  • 1
  • 9
  • 20

2 Answers2

4

It could be due to insufficient disk space. In my case, i was running a Spark Job in AWS EMR with 1 r4.2xlarge (Master) & 2 r4.8xlarge (Core). Spark tuning and increasing the slave nodes solved my problem. Most common issue is memory pressure, bcoz of bad configs (i.e. wrong-sized executors), long-running tasks, and tasks that result in cartesian operations. You can speed up jobs with appropriate caching, and by allowing for data skew. For the best performance, monitor and review long-running and resource-consuming Spark job executions. Hope it helps.

Reference => EMR Spark - TransportClient: Failed to send RPC

Robin
  • 109
  • 8
1

In my case I reduced the memory for driver and executor from 8 to 4G:

spark.driver.memory=4G,
spark.executor.memory=4G

Check your nodes configuration, you should not ask more memory as available.

Dmytro Maslenko
  • 2,247
  • 9
  • 16