How to fix "BlockManagerMasterEndpoint - No more replicas available for rdd" issue?

Question

I am using spark 2.4.1 version and java8 to copy data into cassandra-3.0.

My spark job script is

$SPARK_HOME/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --name MyDriver  \
    --jars "/local/jars/*.jar" \
    --files hdfs://files/application-cloud-dev.properties,hdfs://files/column_family_condition.yml \
    --class com.sp.MyDriver \
    --executor-cores 3 \
    --executor-memory 9g \
    --num-executors 5 \
    --driver-cores 2 \
    --driver-memory 4g \
    --driver-java-options -Dconfig.file=./application-cloud-dev.properties \
    --conf spark.executor.extraJavaOptions=-Dconfig.file=./application-cloud-dev.properties \
    --conf spark.driver.extraClassPath=. \
    --driver-class-path . \
     ca-datamigration-0.0.1.jar application-cloud-dev.properties

Thought job gets success I get whole my log file filled with below WARN.

WARN  org.apache.spark.storage.BlockManagerMasterEndpoint - No more replicas available for rdd_558_5026 !
2019-09-20 00:02:37,882 [dispatcher-event-loop-1] WARN    org.apache.spark.storage.BlockManagerMasterEndpoint - No more replicas available for rdd_558_5367 !
2019-09-20 00:02:37,882 [dispatcher-event-loop-1] WARN  org.apache.spark.storage.BlockManagerMasterEndpoint - No more replicas available for rdd_571_1745 !
 org.apache.spark.network.server.TransportChannelHandler - Exception in connection from /10.24.96.88:58602
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
    at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:288)
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1106)
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:343)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:123)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    at java.lang.Thread.run(Thread.java:745)

What is wrong here ? how shoule I fix it?

having same behavior loadinf solr with spark. Any news on your side ? — parisni, Nov 15 '19 at 12:23
i get the exact same trace, and the job also finishes. I apply both repartition and coalesce before loading to solr. spark-solr uses a RDD under the hood. — parisni, Nov 23 '19 at 09:55
Try this link solution: https://stackoverflow.com/questions/39347392/how-to-fix-connection-reset-by-peer-message-from-apache-spark — Luciana Oliveira, Oct 18 '22 at 18:40
Try this other link: https://stackoverflow.com/questions/56786396/warn-blockmanagermasterendpoint-no-more-replicas-available-for-rdd — Luciana Oliveira, May 15 '23 at 12:39

How to fix "BlockManagerMasterEndpoint - No more replicas available for rdd" issue?

0 Answers0