2

According to the Cloud SQL federated queries documentation one can run a multi-region BigQuery job to query a Cloud SQL instance in a region encompassed by multi-region specified:

A BigQuery multi-region can query any Cloud SQL region in the same location (US, EU). For example:

  • The BigQuery US multi-region can query Cloud SQL us-central1, us-east4, us-west2, and so on.

  • The BigQuery EU multi-region can query Cloud SQL europe-north1, europe-west2, and so on.

But when I try to run a query on a Cloud SQL instance in us-east4 with a processing location of US it throws a cryptic error.

SELECT id FROM EXTERNAL_QUERY("project.us-east4.external_conn_name", "SELECT id FROM users")

Access Denied: URI: projects/project/locations/us-east4/connections/external_conn_name: APPLICATION_ERROR;google.cloud.bigquery.connection.v1beta1/ConnectionService.GetPrivateConnectionInfo;RPC error;AppErrorCode=7;

This same query completes successfully when ran with a processing location of us-east4.

I have been running multi region federated queries successfully for about two months to perform ETL jobs that load Cloud SQL data in us-east4 to BigQuery datasets in US in a fashion similar to the answer described here. My cloud team has audited IAM changes and there do not appear to be any changes that could have caused this issue. It seems as if there were some silently released changes by Google Cloud Platform last night and it has broken about 100 ETL jobs that I run. I am scrambling to come up with a quick fix so my company can continue to use data reliant on these ETL jobs. Any ideas on how I can work around this issue?

pistolpete
  • 968
  • 10
  • 20
  • 1
    Encountered similar issue. We have Cloud SQL deployed in us-west2. Our big query datasets are US multi-region. Suddenly, our bigquery scheduled queries which do joins between the two broke. And our dataflow jobs which use BigQuerySource to fetch data from us-west2 cloudsql stop working. And there is no easy way to move bigquery datasets from one region to another region. And also dataflow is not available in us-west2. I know CloudSQL federated query is a beta product. But this is such a basic and essential feature... – hufeng03 Feb 07 '20 at 22:36
  • @hufeng03 what have you done to fix your broken jobs? – pistolpete Feb 11 '20 at 15:03
  • To solve this issue, we added in us-center1 a new read replicate of our master cloud SQL db in us-west2. – hufeng03 Feb 23 '20 at 02:18

1 Answers1

3

I've submitted several issues to GCP about this and still have not heard back. Hopefully this will be fixed soon.

Here's what I'm doing to hotfix this issue:

  1. Run the extraction job above with a destination of a dataset in the same region (us-east4) as the Cloud SQL instance
  2. Export the data above to a GCS bucket in the same region as step 1 (us-east4)
  3. Run a Cloud Storage Transfer Service job to copy the data in the GCS bucket to a GCS bucket in the same multi-region as the final destination BigQuery dataset (US)
  4. Load the data in the multi-region GCS bucket into the multi-region BigQuery dataset

This has turned a quick and reliable one-step ETL into an expensive four-step ETL with tons of intermediate storage involved. Google please fix this!

pistolpete
  • 968
  • 10
  • 20
  • Google has silently fixed this issue. No response was ever given to me, but this workaround is no longer needed – pistolpete Feb 27 '20 at 16:00
  • Hi, I've stumbled on the same issue, all my existing operations and dataset in BQ are in EU multi-region and I'm trying to add a new source from the cloud SQL instance that is in the europe-west1 location and I can't do it. When I try to use a federated query on cloud SQL instance from 'europe-west1' location with processing location set to 'EU' I get the `Error running query Not found: Connection ` error, What do you mean when you say that google has silently fixed this issue? Should it works or I misunderstood something here? – m2gikbb Jun 14 '21 at 08:05