I have two kinds of task : A and B
(a task means a RDD's whole procedure, for example RDD.map.reduce ... is one task. The RDD is defined by us, which is data separated to many partitions. each partition do its map job separately and will be combined together in reduce.)
A is a short task which only takes less than 5s while B is a big task which will take more than 30 minutes to finish.
We need to get the result of A as quick as possible, while B is a background task, we don't care even if B queued up for an hour or more.
Both A and B have many task partitions.
The case is, if B is schedule before A, A will wait a long time for B. This is not allowed.
I don't think FAIR is a good way, because if B starts while A is executing, B will still starts its task partition which will affect the execution of A.
Is there any way to give a priority to tasks? A have a higher priority over B. Even A is scheduled after B, after the executing partition is finished, A will be executed immediately and the rest of B will be waiting.
Or is there any way to reserve task A some certain resource. Every time A is executed, it can always be scheduled immediately.
I found a way of using scheduler pool, but how to indicate A to a certain pool?
I am using spark with java in standalone mode. I submit the job like javaRDD.map(..).reduce... The javaRDD is a sub-clesse extended form JavaRDD. Task A and B have different RDD class like ARDD and BRDD. They run in the same spark application.
The procedure is like: The app start up -> spark application created, but no job runs -> I click "run A" on the app ui, then ARDD will run. -> I click "run B" on the app ui, then BRDD will run in the same spark application as A.