3

I am trying to use gsutil to submit my spark job from Airflow.

This is my gcloud command: gcloud dataproc jobs submit spark --cluster=xxx --region=us-central1 --class=com.xxx --jars=gs://xxx/xxx/xxx.jar -- xxx -- xxx -- xxx -- gs://xxx/xxx/xxx

I am getting this exception: Exception in thread "main" org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://cluster-xxxx-m/user/root/--;

Is anything wrong with my command?

Kriz
  • 57
  • 4
  • If you have it enabled, try disabling the flat glob algorithm in the GCS connector by setting these Hadoop properties during cluster: `core:fs.gs.glob.flatlist.enable=false core:fs.gs.glob.concurrent.enable=false`. Also upgrade the `GCS_CONNECTOR_VERSION` to the latest. – Andrés May 10 '22 at 16:42

1 Answers1

0

This error could be solved by disabling the flat glob algorithm in the GCS connector, by setting these Hadoop properties during cluster creation core:fs.gs.glob.flatlist.enable=false core:fs.gs.glob.concurrent.enable=false. Additionally, upgrade the GCS_CONNECTOR_VERSION to the latest with this command --metadata GCS_CONNECTOR_VERSION=2.2.6.

Andrés
  • 487
  • 1
  • 12