2

I am trying to create a Dataproc cluster from a Cloud Composer DAG using the DataprocCreateClusterOperator of Airflow. I need to access Cloud SQL from my Dataproc cluster hence need to install the Cloud SQL proxy on the cluster as well. I am providing the initialization action for Cloud SQL Proxy as per docs in my cluster config for the Airflow operator as below:

"initialization_actions": [
        {
           "executable_file" : "gs://<<some_gcs_bucket>>/cloud-sql-proxy.sh"
        }
    ],

"gce_cluster_config": {
        "service_account_scopes": ["https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/sqlservice.admin"],
        "metadata" : {
             "enable-cloud-sql-hive-metastore" : "false",
             "additional-cloud-sql-instances" : "<<PROJECT_ID>>:<<REGION>>:<<INSTANCE_NAME>>"
        }
    }

The cluster creation fails with below message:

google.api_core.exceptions.InvalidArgument: 400 Initialization action failed. Failed action 'gs://<<some_gcs_bucket>>/cloud-sql-proxy.sh', see output in: gs://<<some_gcs_bucket>>/dataproc-initialization-script-0_output

I might be putting the parameters in the cluster config wrongly like the metadata arguments required by the initialization script, if anybody has implemented this use-case previously then would appreciate your inputs.

Dagang
  • 24,586
  • 26
  • 88
  • 133
Rajnil Guha
  • 425
  • 1
  • 4
  • 15
  • What's the error message in gs://<>/dataproc-initialization-script-0_output? – Dagang Jul 09 '22 at 18:07
  • Below is the message in dataproc-initialization-script-0_output: 2022-07-09 10:25:38 URL:https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 [16903544/16903544] -> "cloud_sql_proxy.linux.amd64" [1] Created symlink /etc/systemd/system/multi-user.target.wants/cloud-sql-proxy.service → /usr/lib/systemd/system/cloud-sql-proxy.service. Cloud SQL Proxy installation succeeded Logs can be found in /var/log/cloud-sql-proxy/cloud-sql-proxy.log /etc/google-dataproc/startup-scripts/dataproc-initialization-script-0: line 311: /etc/mysql/conf.d/cloud-sql-proxy.cnf: No such file or directory – Rajnil Guha Jul 10 '22 at 10:46
  • Which image version are you using? – Dagang Jul 10 '22 at 17:25
  • We are using 1.5.53-centos8 to create this cluster. – Rajnil Guha Jul 10 '22 at 18:09
  • 1
    The issue might be specific to CentOS. Can you try Debian/Ubuntu? If that's confirmed, I'll investigate and fix it. – Dagang Jul 10 '22 at 18:10
  • I tried with 1.5.71-debian10 and 1.5.71-ubuntu18 and the cluster was successfully created with successful installation of cloud sql proxy. This seems to be an issue specific to centOS. – Rajnil Guha Jul 11 '22 at 10:02

1 Answers1

0

The problem was specific to CentOS and Rocky. The support for CentOS and Rocky has been added in this PR. To use it, you can copy the source code from Github and upload it to your own GCS bucket.

Dagang
  • 24,586
  • 26
  • 88
  • 133