5

I'm looking for a reliable way in Spark (v2+) to programmatically adjust the number of executors in a session.

I know about dynamic allocation and the ability to configure spark executors on creation of a session (e.g. with --num-executors), but neither of these options are very useful to me because of the nature of my Spark job.

My spark job

The job performs the following steps on a large amount of data:

  1. Perform some aggregations / checks on the data
  2. Load the data into Elasticsearch (ES cluster is typically much smaller than Spark cluster)

The problem

  • If I use the full set of available Spark resources, I will very quickly overload Elasticsearch and potentially even knock over the Elasticsearch nodes.
  • If I use a small enough number of spark executors so as not overwhelm Elasticsearch, step 1 takes a lot longer than it needs to (because it has a small % of the available spark resources)

I appreciate that I can split this job into two jobs which are executed separately with difference Spark resource profiles, but what I really want is to programatically set the number of executors to X at a particular point in my Spark script (before the Elasticsearch load begins). This seems like a useful thing to be able to do generally.

My initial attempt

I played around a bit with changing settings and found something which sort of works, but it feels like a hacky way of doing something which should be doable in a more standardised and supported way.

My attempt (this is just me playing around):

def getExecutors = spark.sparkContext.getExecutorStorageStatus.toSeq.map(_.blockManagerId).collect { 
  case bm if !bm.isDriver => bm
}

def reduceExecutors(totalNumber: Int): Unit = {
  //TODO throw error if totalNumber is more than current
  logger.info(s"""Attempting to reduce number of executors to $totalNumber""")
  spark.sparkContext.requestTotalExecutors(totalNumber, 0, Map.empty)
  val killedExecutors = scala.collection.mutable.ListBuffer[String]()
  while (getExecutors.size > totalNumber) {
      val executorIds = getExecutors.map(_.executorId).filterNot(killedExecutors.contains(_))
      val executorsToKill =  Random.shuffle(executorIds).take(executorIds.size - totalNumber)
      spark.sparkContext.killExecutors(executorsToKill)
      killedExecutors ++= executorsToKill
      Thread.sleep(1000)
  }
}

def increaseExecutors(totalNumber: Int): Unit = {
  //TODO throw error if totalNumber is less than current
  logger.info(s"""Attempting to increase number of executors to $totalNumber""")
  spark.sparkContext.requestTotalExecutors(totalNumber, 0, Map.empty)
  while (getExecutors.size < totalNumber) {
      Thread.sleep(1000)
  }
}
Will Boulter
  • 66
  • 1
  • 4
  • You can control ES bulk write size. You just need to find the appropriate calibration – eliasah Jul 18 '18 at 09:29
  • Yeah if you're referring to the `es.batch.write.entries` here: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#configuration-serialization then I'm already using that but that just dictates how each executor batches up it's partition into separate index requests. Setting it very low could potentially prevent overloading elastic but might slow down my load and also doesn't free up the unnecessary executors – Will Boulter Jul 18 '18 at 10:35

2 Answers2

2

One thing you can try is to call

val dfForES = df.coalesce(numberOfParallelElasticSearchUploads) 

before step #2. This would reduce the number of partitions without shuffling overhead and ensure that only max numberOfParallelElasticSearchUploads executors are sending data to ES in parallel while the rest of them are sitting idle.

If you're running your job on a shared cluster, I'd still recommend enabling dynamic allocation to release these idle executors for a better resource utilization.

Denis Makarenko
  • 2,853
  • 15
  • 29
  • Yeah this is one thing I considered. It does solve the freeing of executors with dynamic allocation but it also introduces quite a large unnecessary step with lots of shuffle (the coalesce). Also, given the data is large and could be loading for a long time, I'd rather keep the executor tasks themselves small so they can be retried without needing to reindex lots of data and so that I can track progress of the load in a granular way – Will Boulter Jul 18 '18 at 15:58
  • 1
    coalesce doesn't cause shuffling (unlike repartition), see coalesce method description(https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala): "Similar to coalesce defined on an `RDD`, this operation results in * a narrow dependency, e.g. if you go from 1000 partitions to 100 partitions, there will not * be a shuffle, instead each of the 100 new partitions will claim 10 of the current partitions" – Denis Makarenko Jul 18 '18 at 16:10
  • Ah interesting, had never really though about the difference before, thanks :) – Will Boulter Jul 18 '18 at 20:49
0

I was looking for a way to programmatically adjust the number of executors in pyspark and this was the top result. Here is what I've gathered from Will's question and from poking around with py4j:

# Create the spark session:
from pyspark.sql import SparkSession
spark = SparkSession.builder.config(... your configs ...).getOrCreate()

# Increase cluster to 5 executors:
spark._jsparkSession.sparkContext().requestTotalExecutors(5, 0, sc._jvm.PythonUtils.toScalaMap({}))

# Decrease cluster back to zero executors:
spark._jsparkSession.sparkContext().requestTotalExecutors(0, 0, sc._jvm.PythonUtils.toScalaMap({}))
javaExecutorIds = spark._jsparkSession.sparkContext().getExecutorIds()
executorIds = [javaExecutorIds.apply(i) for i in range(javaExecutorIds.length())]
print(f'Killing executors {executorIds}')
spark._jsparkSession.sparkContext().killExecutors(javaExecutorIds)

I hope that saves someone else from excessive googling.

Lou Zell
  • 5,255
  • 3
  • 28
  • 23