2

I have configuration and cluster set in GCP and i can submit a spark job, but I am trying to run cloud dataproc job submit spark from my CLI for the same configuration. I've set the service account in my local, I am just unable to build the equivalent command for the console configuration.

console config:

"sparkJob": {
      "mainClass": "main.class",
      "properties": {
        "spark.executor.extraJavaOptions": "-DARGO_ENV_FILE=gs://file.properties",
        "spark.driver.extraJavaOptions": "-DARGO_ENV_FILE=gs://file.properties"
      },
      "jarFileUris": [
        "gs://my_jar.jar"
      ],
      "args": [
        "arg1",
        "arg2",
        "arg3"
      ]
    }

And the equivalent command that I built is-

cloud dataproc job submit spark 
-t spark 
-p spark.executor.extraJavaOptions:-DARGO_ENV_FILE=gs://file.properties,spark.driver.extraJavaOptions-DARGO_ENV_FILE=gs://file.properties 
-m main.class 
-c my_cluster 
-f gs://my_jar.jar 
-a ‘arg1’,‘arg2’,‘arg3’

It's not reading the file.properties files and giving this error-

error while opening file spark.executor.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties,spark.driver.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties: error: open spark.executor.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties,spark.driver.extraJavaOptions=-DARGO_ENV_FILE=gs://file.properties: no such file or directory

And when I run command without mentioning the -p (properties) flag and those files, it runs but eventually fails because of those missing properties files.

Where I am doing something wrong, I can't figure it out.

ps: I'm trying to run dataproc command from CLI something like a spark-submit command-

spark-submit --conf "spark.driver.extraJavaOptions=-Dkafka.security.config.filename=file.properties"
--conf "spark.executor.extraJavaOptions=-Dkafka.security.config.filename=file.properties"
--class main.class my_jar.jar
--arg1
--arg2
--arg3

0 Answers0