1

I am looking for guidance on how to manage incremental load into BigQuery. Here is our process

  • We receive csv files in GCS. As soon as it arrives in GCS we upload it to corresponding tables in the staging area with ingestion timestamp
  • From the staging area we join tables data together & push it to the denormalized table in the Reporting layer (we use SQL for ETL)

My question is how do I make sure my SQL picks up only unprocessed data from the staging area?

One thought I have is to create etl_control_table where I manage header/detail/ref info which has been processed etc. However, I am looking for suggestions if there is some clever way to accomplishing the same (based on ingestion timestamp etc.)

Note: All my tables are partitioned on file_date

Thanks a lot for your replies enter image description here

Dinesh
  • 309
  • 3
  • 14

0 Answers0