1

I read data from a file to an rdd and divide them into three partitions like this:

val rdd=sc.textFile("pathToFile",minPartitions=3)

I run the application on a standalone cluster using three executors. My question is if there is a way to send the first partition to the executor with ID=0 (or to a specific one). For example I want the stage to be executed as follows:

Task 0-Executor 0

Task 1-Executor 1

Task 2-Executor 2

Instead due to spark sending partitions to "random" (I know it's not really random) locations it ends up like this:

Task 0-Executor 1

Task 1-Executor 2

Task 2-Executor 0

I know there is preferredLocations when using makeRDD but I don't how I could convert my code to match that.

Valentina
  • 11
  • 2
  • 3
    Looks like an [XY problem](https://en.wikipedia.org/wiki/XY_problem) to me. @Valentina, can you please explain why you want to achieve this? Maybe we can figure out another simple solution for the actual problem which you are trying to resolve. – vatsal mevada Nov 02 '18 at 11:33
  • as stated by @thebluephantom, there is no such fine-grain control unless you use Jacek's approach from the link. – eliasah Nov 06 '18 at 13:44

1 Answers1

0

There is no such fine-grain control for that. Not required either if one takes into account the architecture. With Custom partitioning there is some implicit control, however.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
  • Thank you for your answer.May you know how Spark makes the choice for the task assignment? There isn't really any way I can simulate that assignment? – Valentina Nov 02 '18 at 10:48