Managing Deltas in BigQuery

Asked Apr 07 '21 at 20:56

Active Apr 07 '21 at 20:56

Viewed 187 times

I am looking for guidance on how to manage incremental load into BigQuery. Here is our process

We receive csv files in GCS. As soon as it arrives in GCS we upload it to corresponding tables in the staging area with ingestion timestamp
From the staging area we join tables data together & push it to the denormalized table in the Reporting layer (we use SQL for ETL)

My question is how do I make sure my SQL picks up only unprocessed data from the staging area?

One thought I have is to create etl_control_table where I manage header/detail/ref info which has been processed etc. However, I am looking for suggestions if there is some clever way to accomplishing the same (based on ingestion timestamp etc.)

Note: All my tables are partitioned on file_date

Thanks a lot for your replies

asked Apr 07 '21 at 20:56

Dinesh

Any Advice please ? – Dinesh Apr 09 '21 at 14:22

Managing Deltas in BigQuery

0 Answers0