I have a bunch of files in Azure Blob storage and it's constantly getting new ones. I was wondering if there is a way for me to first take all the data I have in Blob and move it over to BigQuery and then keep a script or some job running so that all new data in there gets sent over to BigQuery?
-
May [this](https://www.stitchdata.com/integrations/microsoft-azure/google-bigquery/) article could help you. – Jeroen Heier Jun 28 '17 at 15:55
-
Thanks! I did notice that, but I think for their long term integration, I saw their subscriptions (for the amount of data I need to transfer) are 500 bucks a month. I'm aiming to find a consistently free solution (but I'll check to see if that's feasible with this a little more) – Michael Jun 28 '17 at 16:00
-
Unfortunately, the data is also not in a database. It's in Azure Blob storage, which Stitch doesn't seem to allow integration for. – Michael Jun 28 '17 at 16:08
2 Answers
I'm not aware of anything out-of-the-box (on Google's infrastructure) that can accomplish this.
I'd probably set up a tiny VM to:
- Scan your Azure blob storage looking for new content.
- Copy new content into GCS (or local disk).
- Kick off a LOAD job periodically to add the new data to BigQuery.
If you used GCS instead of Azure Blob Storage, you could eliminate the VM and just have a Cloud Function that is triggered on new items being added to your GCS bucket (assuming your blob is in a form that BigQuery knows how to read). I presume this is part of an existing solution that you'd prefer not to modify though.

- 1,092
- 7
- 15
-
Thanks, Adam! I think that's what I'll wind up going with. Yes, this is part of an existing solution that constantly adds these files to Blob and there's no way I can change it to add directly to GCS. I'll get started on that right away :) – Michael Jun 29 '17 at 15:36
BigQuery offers support for querying data directly from these external data sources: Google Cloud Bigtable, Google Cloud Storage, Google Drive. Not include Azure Blob storage. As Adam Lydick mentioned, as a workaround, you could copy data/files from Azure Blob storage to Google Cloud Storage (or other BigQuery-support external data sources).
To copy data from Azure Blob storage to Google Cloud Storage, you can run WebJobs (or Azure Functions), and BlobTriggerred WebJob can trigger a function when a blob is created or updated, in WebJob function you can access the blob content and write/upload it to Google Cloud Storage.
Note: we can install this library: Google.Cloud.Storage to make common operations in client code. And this blog explained how to use Google.Cloud.Storage sdk in Azure Functions.

- 26,415
- 1
- 30
- 41
-
Thanks, Fred! I'll take a look into this to see if this might be a better way to go than the VM! It all depends on cost and speed :) – Michael Jun 29 '17 at 15:37
-
Good idea! Note that BigQuery has a limit on total load operations per day, so if you have a very high rate of writes you may need to batch up your loads or use stream insert. – Adam Lydick Jun 29 '17 at 16:55
-
I'll only be uploading around 100 files a day at most, so that doesn't seem to exceed their limit. – Michael Jun 29 '17 at 18:12