PySpark batch job's configuration submitted through Apache Livy have no effect

Question

I submitted spark batch job through Livy to the remote cluster with the following request body.

REQUEST_BODY = {
    'file': '/spark/batch/job.py',
    'conf': {
        'spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation': 'true',
        'spark.driver.cores': 1,
        'spark.driver.memory': f'12g',
        'spark.executor.cores': 1,
        'spark.executor.memory': f'8g',
        'spark.dynamicAllocation.maxExecutors': 4,
    },
}

And in the python file containing application to be run, the SparkSession is created using the following command:

-- inside /spark/batch/job.py --
spark = SparkSession.builder.getOrCreate()

// spark application after this point using the SparkSession created above

The application work just fine, but the spark acquire all of the resources in the cluster neglecting the configuration set in the request body .

I am suspicious that the /spark/batch/job.py create another SparkSession apart from the one specifying in the Livy request body. But I am not sure how to use the SparkSession provided by the Livy though. The document about this topic is so less.

Does anyone facing the same issue? How can I solved this problem?

Thanks in advance everyone!

PySpark batch job's configuration submitted through Apache Livy have no effect

0 Answers0