0

I've been reading the GCP documentation trying to find a way to ingest data from a Microsoft SQL Server database passively (like using Cloud SQL). The problem is that Cloud SQL keeps idle most of the time (data is updated once a week) and I could not figure out how to start/stop the instance using Cloud Functions or Scheduler. In addition to that, I also need to automate the data pipeline sending data do Dataprep, which is not possible using Cloud SQL (sql server instance).

To sum up, I need to ingest data -> clean/enrich it -> make it available for analysis

*I would like to avoid using external paid APIs

These are the sources I've read:

Cloud Scheduler - how to schedule start/stop

Cloud SQL Admin API Stack Overflow - schedule-start-stop-on-gcp-sql-instance

Issue tracker gcp - start/stop cloud sql

Cloud Storage - upload objects API docs

These is a brief image of the cloud architecture:

Brief architecture introduction

Eduardo Humberto
  • 425
  • 2
  • 5
  • 16
  • Why the data are ingested in SQL Server? Or what are inject the data SQL Server? – guillaume blaquiere May 07 '20 at 18:49
  • @guillaumeblaquiere the data is ingested in SQL Server because it was more straightforward to connect MS SQL Server to Cloud SQL ... but I'm able to to change it.. the problem is how to ingest it from MS SQL Server – Eduardo Humberto May 07 '20 at 19:20
  • If you connect your own MS SQL Server to Cloud SQL, for, just after extracting the data to Cloud Storage, why you not immediately extract the data from your MS SQL server and store it into Cloud Storage? Why you have to copy the data into Cloud SQL? – guillaume blaquiere May 07 '20 at 19:31
  • @guillaumeblaquiere that's the problem...how to stract data from MS SQL Server and store it into Cloud Storage? I've made a python to script to take the data as csv from folder and impload it to CLoud Storage, but couldn't figure a way to connect Cloud SQL to Cloud Storage..(like a gateway, something like that) – Eduardo Humberto May 08 '20 at 16:52
  • You can't connect Cloud SQL to Cloud Storage! You need to have a process between that perform the query, format the data, and store the file. By writing this, I'm thinking to another Google Product: Coud Fusion. I think you can make a connexion to your MS SQL server, event outside GCP, and then start your transform and then sink into BigQuery. Have a look on it, it maybe prevent you to extract file to Cloud Storage (but be careful to the pricing also CDAP instance is expensive!) – guillaume blaquiere May 08 '20 at 19:31
  • 1
    Cloud Data Fusion is still costly (basic: $1.80 per instance per hour.. we are talking roughly about $1100 per month + Cloud SQL costs ... this is by far a bad choice). The straightforward solution that I found is to schedule tasks in the client MS SQL Server Management to export the tables needed to CSV and then schedule a script through Windows Task Scheduler to push those files to Cloud Storage (like a Data Lake). This is more feasible and I can use storage to version the data that comes. I will write an article on Medium and share the script for those who comes to the same problem. – Eduardo Humberto May 09 '20 at 20:20
  • Don't hesitate to share your article link. I will be happy to read it!! – guillaume blaquiere May 09 '20 at 20:26

0 Answers0