I am trying to create a Dataproc
cluster from a Cloud Composer
DAG
using the DataprocCreateClusterOperator
of Airflow
. I need to access Cloud SQL
from my Dataproc
cluster hence need to install the Cloud SQL proxy
on the cluster as well. I am providing the initialization action for Cloud SQL Proxy
as per docs in my cluster config for the Airflow
operator as below:
"initialization_actions": [
{
"executable_file" : "gs://<<some_gcs_bucket>>/cloud-sql-proxy.sh"
}
],
"gce_cluster_config": {
"service_account_scopes": ["https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/sqlservice.admin"],
"metadata" : {
"enable-cloud-sql-hive-metastore" : "false",
"additional-cloud-sql-instances" : "<<PROJECT_ID>>:<<REGION>>:<<INSTANCE_NAME>>"
}
}
The cluster creation fails with below message:
google.api_core.exceptions.InvalidArgument: 400 Initialization action failed. Failed action 'gs://<<some_gcs_bucket>>/cloud-sql-proxy.sh', see output in: gs://<<some_gcs_bucket>>/dataproc-initialization-script-0_output
I might be putting the parameters in the cluster config wrongly like the metadata arguments required by the initialization script, if anybody has implemented this use-case previously then would appreciate your inputs.