Trying to configure spark for the entire azure synapse pipeline, Found Spark session config magic command and How to set Spark / Pyspark custom configs in Synapse Workspace spark pool . %%configure
magic command works fine for a single notebook. Example:
Insert cell with the below content at the Beginning of the notebook
%%configure -f
{
"driverMemory": "28g",
"driverCores": 4,
"executorMemory": "32g",
"executorCores": 4,
"numExecutors" : 5
}
Then the below emits expected values.
spark_executor_instances = spark.conf.get("spark.executor.instances")
print(f"spark.executor.instances {spark_executor_instances}")
spark_executor_memory = spark.conf.get("spark.executor.memory")
print(f"spark.executor.memory {spark_executor_memory}")
spark_driver_memory = spark.conf.get("spark.driver.memory")
print(f"spark.driver.memory {spark_driver_memory}")
Although if i add that notebook as a first activity in Azure Synapse Pipeline, what happens is that Apache Spark Application which executes that notebook has correct configuration, but the rest of the notebooks in pipeline fall back to default configuration.
How can i configure spark for the entire pipeline ? Should i copy paste above %%configure ..
in each and every notebook in pipeline or is there a better way ?