2

We are currently saving events to bigquery via uploading files to google cloud storage and then inserting these files into bigquery.

We have a very active application running on cirka 300 nodes and saving around 1 billion events per day.

We now plan to change this to use the "new" streaming API.

My concern now is that our current solution creates the table if it does not exist which is not the case for the streaming API. (Our event tables are sharded on game + month to reduce the data that we have to query.)

How do we solve this in the best way? I.e. having +300 nodes streaming data to bigquery and to let new tables gets created when needed!

Thanks in advance!

/Gunnar Eketrapp

Gunnar Eketrapp
  • 2,009
  • 1
  • 19
  • 33

1 Answers1

1

Talking from our experience. We created scripts that manage our tables at deploy and not real time when day changes. This script is being instructed to create each sharded table in advance for 1 year.

In case the structure changes, we can do the patch call for older tables, and for new tables we simply delete and recreate them as they are empty tables.

When you have nodes in parallel it's hard to know which one acts as primary node to create the tables, so for this we used a deploy phase, and actually we trigger the tables when we developers run the deploy.

You can anticipate game ID's in advance, and create in advance the tables for them. It's much easier to run a script that creates some tables and updates in batches, than write the proper synchronized way to do this from all available nodes. If you cannot anticipate game ID, than you can call the synchronized API that will create the tables in advance when the game ID is available.

Pentium10
  • 204,586
  • 122
  • 423
  • 502
  • Do you insert dummy data when the table is created!? There seems no way to create a table without also inserting data into it. – Gunnar Eketrapp Jun 24 '15 at 13:05
  • Using Tables::insert you can create an empty table: https://cloud.google.com/bigquery/docs/reference/v2/tables/insert – Pentium10 Jun 24 '15 at 13:12