I am using Google dataproc to submit spark jobs and google cloud composer to schedule them. Unfortunately, I am facing difficulties.
I am relying on .conf
files (typesafe config files) to pass arguments to my spark jobs.
I am using the following python code for the airflow dataproc:
t3 = dataproc_operator.DataProcSparkOperator(
task_id ='execute_spark_job_cluster_test',
dataproc_spark_jars='gs://snapshots/jars/pubsub-assembly-0.1.14-SNAPSHOT.jar',
cluster_name='cluster',
main_class = 'com.organ.ingestion.Main',
project_id='project',
dataproc_spark_properties={'spark.driver.extraJavaOptions':'gs://file-dev/fileConf/development.conf'},
scopes='https://www.googleapis.com/auth/cloud-platform', dag=dag)
But this is not working and I am getting some errors.
Could anyone help me with this?
Basically I want to be able to override the .conf
files and pass them as arguments to my DataProcSparkOperator
.
I also tried to do
arguments=`'gs://file-dev/fileConf/development.conf'`:
but this didn't take into account the .conf
file mentioned in the arguments .