Imagine I do some Spark operations on a file hosted in HDFS. Something like this:
var file = sc.textFile("hdfs://...")
val items = file.map(_.split('\t'))
...
Because in the Hadoop world the code should go where the data is, right?
So my question is: How do Spark workers know of HDFS data nodes? How does Spark know on which Data Nodes to execute the code?