1

When using Google Data Prep, I am able to create automated schedules to run jobs that update my BigQuery tables.

However, this seems pointless when considering that the data used in Prep is updated by manually dragging and dropping CSVs (or JSON, xlsx, whatever) into the data storage bucket.

I have attempted to search for a definitive way of updating this bucket automatically with files that are regularly updated on my PC, but there seems to be no best-practice solution that I can find.

How should one go about doing this efficiently and effectively?

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • So, you want to automate uploading files from your PC to GCS, is that right? – Graham Polley Jun 27 '18 at 10:31
  • Yeah, that's correct – Henry Sumner Jun 27 '18 at 10:40
  • There are several ways, you could have a script running on your local machine that pushes the contents of a particular folder into cloud storage at a set time each day. Where is the data coming from? It would probably be preferable to create a direct connection between the source and BQ, cutting out the .csv step? – Ben P Jun 27 '18 at 10:45
  • 1
    You don't have a lot of options apart from just using the `gsutil` tool and calling it on a cron. Are you sure these files should be getting uploaded from your local PC? Usually, data/files are generated by remote servers somewhere. – Graham Polley Jun 27 '18 at 10:46
  • Essentially, the files are dropped into a shared directory folder at about 2am every day by a batch process scheduled by another department in the company, which is why I can't directly connect the data to big query; I don't personally have access to it. At the moment, these files are automatically loaded into SAS libraries by our schedule, but I want to also load them into BigQuery as it's faster. – Henry Sumner Jun 27 '18 at 11:20
  • So, what happens so with the possibility of have a script running on your local machine that pushes the contents of a particular folder into Cloud Storage as @BenP suggested? – Temu Jul 10 '18 at 15:09
  • If that's possible, then I'd gladly take that on as a solution. But I wouldn't even know which language to use or how exactly to do that, which was the purpose of asking this question in the first place – Henry Sumner Jul 12 '18 at 07:48

1 Answers1

1

So, in order to upload files from your computer to Google Cloud Storage, there are a few possibilities. If you just run an daemon process which handles any change in that shared directory, you can code an automatic upload in this different languages: C#, Go, Java, Node.JS, PHP, Python or Ruby.

You have here some code examples for uploading objects but, be aware that there is also a detailed Cloud Storage Client Libraries references and you can also find the GitHub links in "Additional Resources".

Temu
  • 859
  • 4
  • 11