I Was running the spark on the hadoop with 2 workers and 2 datanodes . First machine contains : sparkmaster , namenode , worker-1 , datanode-1. Second machine contains : worker2, datanode2
In hadoop cluster there are 2 files under the /usr directory on datanode-1 : Notice.txt and on datanode-2 : README.txt
I want to create a rdd from these two file and make a count of lines.
on the first machine I ran spark shell with master spark://masterIP:7077 [Standalone mode]
Then on the scala command line created RDD with val rdd = sc.textFile("/usr/") but when i went for the count operation rdd.count() it throws following error
(TID 2, masterIP, executor 1): org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-1313298757-masterIP-1499412323227:blk_1073741827_1003 file=/usr/README.txt
worker-1 is picking the NOTICE.txt but worker-2 is not picking README.txt
I was not getting the problem, any help will be appreciated , Thanks