I have configuration and cluster set in GCP and i can submit a spark job, but I am trying to run cloud dataproc job submit spark
from my CLI for the same configuration.
I've set the service account in my local, I am just unable to build the equivalent command for the console configuration.
console config:
"sparkJob": {
"mainClass": "main.class",
"properties": {
"spark.executor.extraJavaOptions": "-DARGO_ENV_FILE=gs://file.properties",
"spark.driver.extraJavaOptions": "-DARGO_ENV_FILE=gs://file.properties"
},
"jarFileUris": [
"gs://my_jar.jar"
],
"args": [
"arg1",
"arg2",
"arg3"
]
}
And the equivalent command that I built is-
cloud dataproc job submit spark
-t spark
-p spark.executor.extraJavaOptions:-DARGO_ENV_FILE=gs://file.properties,spark.driver.extraJavaOptions-DARGO_ENV_FILE=gs://file.properties
-m main.class
-c my_cluster
-f gs://my_jar.jar
-a ‘arg1’,‘arg2’,‘arg3’
It's not reading the file.properties files and giving this error-
error while opening file spark.executor.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties,spark.driver.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties: error: open spark.executor.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties,spark.driver.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties: no such file or directory
And when I run command without mentioning the -p (properties) flag and those files, it runs but eventually fails because of those missing properties files.
Where I am doing something wrong, I can't figure it out.
ps: I'm trying to run dataproc command from CLI something like a spark-submit command-
spark-submit --conf "spark.driver.extraJavaOptions=-Dkafka.security.config.filename=file.properties"
--conf "spark.executor.extraJavaOptions=-Dkafka.security.config.filename=file.properties"
--class main.class my_jar.jar
--arg1
--arg2
--arg3