1

I need to add a config file to driver spark classpath on google dataproc.

I have try to use --files option of gcloud dataproc jobs submit spark but this not work.

Is there a way to do it on google dataproc?

theShadow89
  • 1,307
  • 20
  • 44
  • See https://stackoverflow.com/questions/58238269/add-conf-file-to-classpath-in-google-dataproc/58293749?noredirect=1#comment102962018_58293749 – Dagang Oct 09 '19 at 16:08

2 Answers2

4

In Dataproc, anything listed as a --jar will be added to the classpath and anything listed as a --file will be made available in each spark executor's working directory. Even though the flag is --jars, it should be safe to put non-jar entries in this list if you require the file to be on the classpath.

Angus Davis
  • 2,673
  • 13
  • 20
1

I know, I am answering too late. Posting for new visitors.

One can execute this using cloud shell. Have tested this.

gcloud dataproc jobs submit spark --properties spark.dynamicAllocation.enabled=false --cluster=<cluster_name> --class com.test.PropertiesFileAccess --region=<CLUSTER_REGION> --files gs://<BUCKET>/prod.predleads.properties --jars gs://<BUCKET>/snowflake-common-3.1.34.jar
kalpesh
  • 328
  • 3
  • 14
  • if i need to access the file, and use it to spark program, how do i do that ? Can i refer to the bucket directly ? or just the name of the file ? – Karan Alang Feb 03 '22 at 20:31