I read data from a file to an rdd and divide them into three partitions like this:
val rdd=sc.textFile("pathToFile",minPartitions=3)
I run the application on a standalone cluster using three executors. My question is if there is a way to send the first partition to the executor with ID=0 (or to a specific one). For example I want the stage to be executed as follows:
Task 0-Executor 0
Task 1-Executor 1
Task 2-Executor 2
Instead due to spark sending partitions to "random" (I know it's not really random) locations it ends up like this:
Task 0-Executor 1
Task 1-Executor 2
Task 2-Executor 0
I know there is preferredLocations
when using makeRDD
but I don't how I could convert my code to match that.