I submitted spark batch job through Livy to the remote cluster with the following request body.
REQUEST_BODY = {
'file': '/spark/batch/job.py',
'conf': {
'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation': 'true',
'spark.driver.cores': 1,
'spark.driver.memory': f'12g',
'spark.executor.cores': 1,
'spark.executor.memory': f'8g',
'spark.dynamicAllocation.maxExecutors': 4,
},
}
And in the python file containing application to be run, the SparkSession is created using the following command:
-- inside /spark/batch/job.py --
spark = SparkSession.builder.getOrCreate()
// spark application after this point using the SparkSession created above
The application work just fine, but the spark acquire all of the resources in the cluster neglecting the configuration set in the request body .
I am suspicious that the /spark/batch/job.py create another SparkSession apart from the one specifying in the Livy request body. But I am not sure how to use the SparkSession provided by the Livy though. The document about this topic is so less.
Does anyone facing the same issue? How can I solved this problem?
Thanks in advance everyone!