Setting YARN queue in PySpark

Question

When creating a Spark context in PySpark, I typically use the following code:

conf = (SparkConf().setMaster("yarn-client").setAppName(appname)
        .set("spark.executor.memory", "10g")
        .set("spark.executor.instances", "7")
        .set("spark.driver.memory", "5g")
        .set("spark.shuffle.service.enabled","true")
        .set("spark.dynamicAllocation.enabled","true")
        .set("spark.dynamicAllocation.minExecutors","5")
        )
sc = SparkContext(conf=conf)

However, this puts it in the default queue, which is almost always over capacity. We have several less busy queues available, so my question is - how do I set my Spark context to use another queue?

Edit: To clarify - I'm looking to set the queue for interactive jobs (e.g., exploratory analysis in a Jupyter notebook), so I can't set the queue with spark-submit.

Manu Gupta · Accepted Answer · 2018-02-06T15:22:27.460

15

You can use below argument in you spark-submit command.

--queue queue_name

You can set this property in your code. spark.yarn.queue

Hope this will help.

Thanks

edited Feb 06 '18 at 15:22

answered Feb 06 '18 at 15:14

Manu Gupta

820
6
20

What about when not using spark-submit? I often do exploratory analysis in a Jupyter notebook, for example. – Tim Feb 06 '18 at 15:19
1

please set the mentioned property. I added after edit. – Manu Gupta Feb 06 '18 at 15:23

score 4 · Answer 2 · edited Jan 15 '21 at 08:43

4

Try to use spark.yarn.queue rather than queue.

conf = pyspark.SparkConf().set("spark.yarn.queue", "your_queue_name")
sc

edited Jan 15 '21 at 08:43

vahlala

355
3
14

answered Apr 28 '18 at 10:05

Bean Dog

49
2

1

add some information why to use this yarn.spark queue and maybe a link to the documentation of it. – tung Apr 28 '18 at 10:28
2

I was unable to find a reference to the `yarn.spark.queue` that you suggest. Instead, this worked for me: `SparkSession.builder.appName('myapp').config(conf=SparkConf().setAll([('spark.yarn.queue', 'root.myqueue')])).getOrCreate()` – 0_0 Jun 20 '18 at 11:46
Setting "spark.yarn.queue" to the queue name helped. – Naresh S Dec 05 '18 at 09:55

Setting YARN queue in PySpark

2 Answers2