1

I'm new to BigQuery and programming in general (fluent with SQL) as I come from a data analytics background. I'm using BigQuery to analyze the events data of my Android app. I have the data flowing into BQ tables nicely and am able to write queries in the BQ Web UI, saving their results in new tables to be used to analyze further in Tableau. The issue is that I have to run ~10 queries every single day as my events tables get updated everyday. What is the best process to be able to automate (and schedule) running of the saved queries (saved in BQ Web UI) and update the tables with results from the queries periodically?

I hav explored the bq command tool, jobs, BigQuery APIs, but I'm lost in the plethora of information (I blame the lack of my programming experience here). Could someone please help?

1 Answers1

0

This might be considered a too broad of a question but I'll try go give my perspective.

At our company, we automated everything using the python API client tool.

There are some advantages for us such as we not only need to run dozens of queries everyday as well but they need to be adapted for each store that we have. We do so by using Jinja's templates and some config files. We just found that python served us quite well for these tasks (doing so in the command line tool might be a bit more challenging I suppose).

For the scheduling we just set CRON jobs in some EC2 instances we have in our stack and they do the whole operation when the time is set (we plan to migrate the whole stack to GCP but it will take a while).

In the end this is just one way of solving this task. There are many others indeed. As far as simplicity and maintenance goes I think you will find this approach quite appropriate for you as well.

Community
  • 1
  • 1
Willian Fuks
  • 11,259
  • 10
  • 50
  • 74
  • 1
    Thanks for the pointers Will! I'll try the solutions you have listed above and update here based on whether they worked for me or not. – Ankit Mittal Feb 06 '17 at 09:32