0

I am trying to submit a pyspark job to google cloud dataproc via the command line these are my arguments;

gcloud dataproc jobs submit pyspark --cluster mongo-load --properties org.mongodb.spark:mongo-spark-connector_2.11:2.2.0 mongo_load.py

I am constantly getting an exception

--properties: Bad syntax for dict arg: [org.mongodb.spark:mongo-spark-connector_2.11:2.2.0]

I tried some of the escaping options from google shown here but nothing seems to work.

Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31

2 Answers2

2

figured out I just needed to pass

spark.jars.packages=org.mongodb.spark:mongo-spark-connector_2.11:2.2.0
0

In addition to the answer by @Marlon Gray, if you need to pass more that one package you need to escape the spark.jars.packages string, like

--properties=^#^spark.jars.packages=mavencoordinate1,mavencoordinate2

Please check https://cloud.google.com/sdk/gcloud/reference/topic/escaping for further details.

Galuoises
  • 2,630
  • 24
  • 30