We currently have a proposed architecture for dealing with the append only nature of BigQuery, to support a dashboard and reporting function in our system. We have a number of customer tables (tenants) - let's say one table per customer. Each table contains a BigQuery 'mirror' of a primary NoSQL data store - a list of records. We want to use BigQuery as the tool that drives the reporting and dashboard of these events, for customers.
Customers may make updates to these records - these updates would need to be reflected in BigQuery. We are looking at maintaining a secondary table, for each primary table, called "Reconcile" that stores the updates (or deletions) in the last, say, 10 minutes. Then, every 10 minutes, we would select all the data from the "Reconcile" table and all the data from the primary table, minus the rows from the Reconcile table, to create a new up to date view.
(Old Table - 'Reconcile' Rows) + Reconcile => New Table
Drop Old Table, Truncate Reconcile Table
Our challenge here is that some customer tables can be 10,20,30GB large. The rematerialization process, run every 10 minutes could quickly become expensive.
Is there a more (cost) or technically efficient way to copy tables in BigQuery without incurring this large operation (and its associated cost)? We don't really want to use the DML and don't want to have to scope every query to search for the 'latest' row for each record (although this answer Does Google BigQuery/ Amazon Redshift use column-based relational database or NoSQL database? does a good job of explaining it).