Send tasks to specific executors in Spark

Question

I read data from a file to an rdd and divide them into three partitions like this:

val rdd=sc.textFile("pathToFile",minPartitions=3)

I run the application on a standalone cluster using three executors. My question is if there is a way to send the first partition to the executor with ID=0 (or to a specific one). For example I want the stage to be executed as follows:

Task 0-Executor 0

Task 1-Executor 1

Task 2-Executor 2

Instead due to spark sending partitions to "random" (I know it's not really random) locations it ends up like this:

Task 0-Executor 1

Task 1-Executor 2

Task 2-Executor 0

I know there is preferredLocations when using makeRDD but I don't how I could convert my code to match that.

Looks like an [XY problem](https://en.wikipedia.org/wiki/XY_problem) to me. @Valentina, can you please explain why you want to achieve this? Maybe we can figure out another simple solution for the actual problem which you are trying to resolve. — vatsal mevada, Nov 02 '18 at 11:33
as stated by @thebluephantom, there is no such fine-grain control unless you use Jacek's approach from the link. — eliasah, Nov 06 '18 at 13:44

thebluephantom · Answer 1 · 2018-11-04T14:21:34.017

0

There is no such fine-grain control for that. Not required either if one takes into account the architecture. With Custom partitioning there is some implicit control, however.

edited Nov 04 '18 at 14:21

answered Nov 02 '18 at 10:44

thebluephantom

16,458
8
40
83

Thank you for your answer.May you know how Spark makes the choice for the task assignment? There isn't really any way I can simulate that assignment? – Valentina Nov 02 '18 at 10:48

Send tasks to specific executors in Spark

1 Answers1