Block missing exception while processing the data from hdfs in spark standalone cluster

Question

I Was running the spark on the hadoop with 2 workers and 2 datanodes . First machine contains : sparkmaster , namenode , worker-1 , datanode-1. Second machine contains : worker2, datanode2

In hadoop cluster there are 2 files under the /usr directory on datanode-1 : Notice.txt and on datanode-2 : README.txt

I want to create a rdd from these two file and make a count of lines.

on the first machine I ran spark shell with master spark://masterIP:7077 [Standalone mode]

Then on the scala command line created RDD with val rdd = sc.textFile("/usr/") but when i went for the count operation rdd.count() it throws following error

(TID 2, masterIP, executor 1): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1313298757-masterIP-1499412323227:blk_1073741827_1003 file=/usr/README.txt

worker-1 is picking the NOTICE.txt but worker-2 is not picking README.txt

I was not getting the problem, any help will be appreciated , Thanks

Looks like your node is not able to find the hdfs block. Can you try 'hadoop fsck "filename"' on the file? — Sanket_patil, Jul 12 '17 at 12:54
It gave me following output : Total size: 16344 B Total dirs: 1 Total files: 2 Total symlinks: 0 Total blocks (validated): 2 (avg. block size 8172 B) — Rishikesh Teke, Jul 12 '17 at 14:34
I am confused whether missing in spark configuration or hadoop configuration , At my namenode and spark master its showing me the proper datanodes and workers respecttively. — Rishikesh Teke, Jul 31 '17 at 06:36

Block missing exception while processing the data from hdfs in spark standalone cluster

0 Answers0