what is best way to transfer all records from BigQuery table to Cloud SQL table on daily basis (every day expected approximate count of records more than 255801312 [255 million]). I know we can create dataflow pipelines from BQ to CloudSQL, but this large amount of data will run for hours and hours. Any best solution to implement within google cloud?.
Asked
Active
Viewed 984 times
1
-
Did you try to export the data in CSV from BigQuery, and to import them in Cloud SQL? It's the most efficient way – guillaume blaquiere Nov 10 '21 at 21:37
-
exporting data in to csv and importing to cloudsql is a manual process, not sure we can follow this approach for every day maintenance that reads 255 millions records. – Srinivas B Nov 11 '21 at 04:14
-
Manual? or a [workflow](https://cloud.google.com/workflows) ;) – guillaume blaquiere Nov 11 '21 at 14:11
-
@guillaumeblaquiere, I see your post: https://medium.com/google-cloud/merge-clean-transform-your-csvs-easily-with-bigquery-3a73c0c26d57 EXPORT DATA OPTIONS is might be the right choice but please guide me what is the best way this to run through work flows(kind of dataflow jobs) that can also help to import the files from google storage bucket in to Cloudsql – Srinivas B Nov 11 '21 at 21:41
-
Workflow orchestrate API Calls. First perform a call to BigQuery to export the data in a bucket, then call Cloud SQL API to import the files. – guillaume blaquiere Nov 11 '21 at 22:31
-
I built a working example. Not with the size of your dataset and I can't tell you that will be more efficient. I will write an article on that. Your feedback will be welcomed (I can mention you if you have a Medium account) – guillaume blaquiere Nov 12 '21 at 21:48
-
@guillaumeblaquiere This is absolutely great answer, I would like to take this for production implementation, As you mentioned please share your article soon that might help with additional information, may include best practices for production as well, I am following you on Medium – Srinivas B Nov 14 '21 at 22:18
-
Here the article: https://medium.com/google-cloud/replicate-data-from-bigquery-to-cloud-sql-2b23a08c52b1 – guillaume blaquiere Nov 23 '21 at 08:54
1 Answers
1
Here a working example of Workflow. You need to give enough permission to your workflow service account (cloudsql admin, bigquery dataviewer + job user, cloud storage admin) and the table must exist in your Cloud SQL instance (I tested with MySQL).
The article is cooking with more detail in it. Replace the bucket, the projectid, the Cloud SQL instance name (mysql in my case), the query, the table name, the database schema
main:
steps:
- assignStep:
assign:
- bucket: "TODO"
- projectid: "TODO"
- prefix: "workflow-import/export"
- listResult:
nextPageToken: ""
- export-query:
call: googleapis.bigquery.v2.jobs.query
args:
projectId: ${projectid}
body:
query: ${"EXPORT DATA OPTIONS( uri='gs://" + bucket + "/" + prefix + "*.csv', format='CSV', overwrite=true,header=false) AS SELECT id, email FROM `copy_dataset.name_test`"}
useLegacySql: false
- importfiles:
call: import_files
args:
pagetoken: ${listResult.nextPageToken}
bucket: ${bucket}
prefix: ${prefix}
projectid: ${projectid}
result: listResult
- missing-files:
switch:
- condition: ${"nextPageToken" in listResult}
next: importfiles
import_files:
params:
- pagetoken
- bucket
- prefix
- projectid
steps:
- list-files:
call: googleapis.storage.v1.objects.list
args:
bucket: ${bucket}
pageToken: ${pagetoken}
prefix: ${prefix}
result: listResult
- process-files:
for:
value: file
in: ${listResult.items}
steps:
- wait-import:
call: load_file
args:
projectid: ${projectid}
importrequest:
importContext:
uri: ${"gs://" + bucket + "/" + file.name}
database: "test_schema"
fileType: CSV
csvImportOptions:
table: "workflowimport"
- return-step:
return: ${listResult}
load_file:
params: [importrequest,projectid]
steps:
- callImport:
call: http.post
args:
url: ${"https://sqladmin.googleapis.com/v1/projects/" + projectid + "/instances/mysql/import"}
auth:
type: OAuth2
body: ${importrequest}
result: operation
- chekoperation:
switch:
- condition: ${operation.body.status != "DONE"}
next: wait
next: completed
- completed:
return: "done"
- wait:
call: sys.sleep
args:
seconds: 5
next: getoperation
- getoperation:
call: http.get
args:
url: ${operation.body.selfLink}
auth:
type: OAuth2
result: operation
next: chekoperation
More detail in my medium article

guillaume blaquiere
- 66,369
- 2
- 47
- 76