I am trying to create a dataproc cluster that will connect dataproc to pubsub. I need to add multiple jars on cluster creation in the spark.jars flag
gcloud dataproc clusters create cluster-2c76 --region us-central1 --zone us-central1-f --master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--num-workers 2 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 500 \
--image-version 1.4-debian10 \
--properties spark:spark.jars=gs://bucket/jars/spark-streaming-pubsub_2.11-2.4.0.jar,gs://bucket/jars/google-oauth-client-1.31.0.jar,gs://bucket/jars/google-cloud-datastore-2.2.0.jar,gs://bucket/jars/pubsublite-spark-sql-streaming-0.2.0.jar spark:spark.driver.memory=3000m \
--initialization-actions gs://goog-dataproc-initialization-actions-us-central1/connectors/connectors.sh \
--metadata spark-bigquery-connector-version=0.21.0 \
--scopes=pubsub,datastore
I get thrown this error
ERROR: (gcloud.dataproc.clusters.create) argument --properties: Bad syntax for dict arg: [gs://gregalr/jars/spark-streaming-pubsub_2.11-2.3.4.jar]. Please see `gcloud topic flags-file` or `gcloud topic escaping` for information on providing list or dictionary flag values with special characters.
This looked promising, but fails
If there is a better way to connect dataproc to pubsub, please share