How to schedule a task with priority in spark?

Question

I have two kinds of task : A and B

(a task means a RDD's whole procedure, for example RDD.map.reduce ... is one task. The RDD is defined by us, which is data separated to many partitions. each partition do its map job separately and will be combined together in reduce.)

A is a short task which only takes less than 5s while B is a big task which will take more than 30 minutes to finish.

We need to get the result of A as quick as possible, while B is a background task, we don't care even if B queued up for an hour or more.

Both A and B have many task partitions.

The case is, if B is schedule before A, A will wait a long time for B. This is not allowed.

I don't think FAIR is a good way, because if B starts while A is executing, B will still starts its task partition which will affect the execution of A.

Is there any way to give a priority to tasks? A have a higher priority over B. Even A is scheduled after B, after the executing partition is finished, A will be executed immediately and the rest of B will be waiting.

Or is there any way to reserve task A some certain resource. Every time A is executed, it can always be scheduled immediately.

I found a way of using scheduler pool, but how to indicate A to a certain pool?

I am using spark with java in standalone mode. I submit the job like javaRDD.map(..).reduce... The javaRDD is a sub-clesse extended form JavaRDD. Task A and B have different RDD class like ARDD and BRDD. They run in the same spark application.

The procedure is like: The app start up -> spark application created, but no job runs -> I click "run A" on the app ui, then ARDD will run. -> I click "run B" on the app ui, then BRDD will run in the same spark application as A.

The priority of tasks is not a Spark' job. This is a job for [Cluster Manager](https://spark.apache.org/docs/latest/cluster-overview.html) Depending on which you have - you need to create specific configuration. You can refer [this](https://stackoverflow.com/questions/28664834/which-cluster-type-should-i-choose-for-spark) question for more information. — Vladislav Varslavans, Mar 08 '18 at 11:33
I think what you refer as a task is actually a job, isn't it? How do you submit jobs for execution? Is this multi-threaded Spark application? How do you know how long a job would take to produce a result? — Jacek Laskowski, Mar 08 '18 at 13:10
@ Jacek Laskowski I submit jobs using java. What is multiple threaded? We have there spark nodes on standalone mode. How long would take? It is shown on the web ui, isn't it? — HalfLegend, Mar 09 '18 at 05:40
@Vladislav Varslavans No, I want to set the priority to each task myself, not to manager them automatically by yarn or mesos. — HalfLegend, Mar 09 '18 at 05:43
I think you need to read through [this](https://spark.apache.org/docs/latest/job-scheduling.html) link. There you will find an answer to your question, if there is any. — Vladislav Varslavans, Mar 09 '18 at 08:32

How to schedule a task with priority in spark?

0 Answers0