This is probably incorrect use case for BigQuery but I have following problem: I need to periodically update Big Query table. Update should be "atomic" in a sense that clients which read data should either use only old version of data or completely new version of data. The only solution I have now is to use date partitions. The problem with this solution is that clients which just need to read up to date data should know about partitions and get data only from certain partitions. Every time I want to make a query I would have first to figure out which partition to use and only then select from the table. Is there any way to improve this? Ideally I would like solution to be easy and transparent for clients who read data.
Asked
Active
Viewed 389 times
1
-
may be trying materialized view? – Sharmanand Mishra May 08 '20 at 13:53
1 Answers
1
You didn't mention the size of your update, I can only give some general guideline.
- Most BigQuery updates, including single DML (INSERT/UPDATE/DELETE/MERGE) and single load job, are atomic. Your reader reads either old data or new data.
- Lacking multi-statement transaction right now, if you do have updates which doesn't fit into single load job, the solution is:
- Load update into a staging table, after all loads finished
- Use single INSERT or MERGE to merge updates from staging table to primary data table
- The drawback: scanning staging table is not for free
Update: since you have multiple tables to update atomically, there is a tiny trick which may be helpful.
Assuming for each table that you need an update, there is a ActivePartition
column as partition key, you may have a table with only one row.
CREATE TABLE ActivePartition (active DATE);
Each time after loading, you set ActivePartition.active to a new active date, then your user use a script:
DECLARE active DATE DEFAULT (SELECT active FROM ActivePartition);
-- Actual query
SELECT ... FROM dataTable WHERE ActivePartition = active

Yun Zhang
- 5,185
- 2
- 10
- 29
-
Update is basically replacing entire data. Idea with staging table is nice but unfortunately I have multiple connected tables which need to be updated together. For now I decided to stick with partitioned tables but keep number of partitions to two at most. Kind of staging partition plus production partition. – Alex May 10 '20 at 12:05
-