Now I want to backfill the data in existing 2M rows but for only those three newly added columns with actual data from json file.
Since loading data is free of charge, I'd reload the whole table with WRITE_TRUNCATE
option to overwrite the existing data.
What you said confuses me because:
- If your 2M rows in BQ table has the same data as what's in JSON file, why do you care whether they are touched or not?
- If your 2M rows in BQ table has been altered in some way, how do you expect the rows in JSON file matches the altered data on a per row basis (to backfill the missing column)?
--
Update: based on the comment, it seems that the loaded rows has been altered in some way. Then:
- For your existing data, if there is not a (logical) primary key for you to use to match the rows, then it is technically impossible to "match and update".
- If your existing data do have a logical primary key, and you don't mind the cost, you could load the full table into a temporary table then use DML to backfill the missing columns.
For your future data loading, if you want the loading to be incremental (either on rows or on columns), better you could have your loaded table untouched so that it represents that 'full fact' and keep the 'altered rows' in a separate table, assuming you have a logical primary key to match them.