I have a Spark standalone cluster having 2 worker nodes and 1 master node.
Using spark-shell, I was able to read data from a file on local filesystem, then did some transformations and saved the final RDD in /home/output(let's say) The RDD got saved successfully but only on one worker node and on master node only _SUCCESS file was there.
Now, if I want to read this output data from /home/output, I am not getting any data as it is getting 0 data on master and then I am assuming that it is not checking the other worker nodes for that.
It would be great if someone can throw some light on why Spark is not reading from all the worker nodes or what is the mechanism which Spark uses to read the data from worker nodes.
scala> sc.textFile("/home/output/")
res7: org.apache.spark.rdd.RDD[(String, String)] = /home/output/ MapPartitionsRDD[5] at wholeTextFiles at <console>:25
scala> res7.count
res8: Long = 0