I have around 50 tables under a BigQuery project created by an ETL tool from different platforms. There is no primary key in the tables and few tables have columns over 100. Each platform has different fields and different data types and the team’s requirement is to create a one big master table of all the tables combined
I am manually accomplishing this task by listing all the columns of the tables in an Excel and null out fields which are not present in the table to form a query for each table and UNION ALL together to create a master table. I then scheduled it in BigQuery to refresh every day.
Ex:
Tbl A Tbl B Tbl C
char1 char1 char1
num1 num3 num1
char2 char2 char2
char5 num2
num2 num3
char3
char4
Manual Query:
With mast_tbl as (
select concat(cast(row_number() over (partition by date) as string), ' | ', ' TblA) as pk_client ,
char1,
num1,
char2,
null as num3,
cast(null as string) as char5,
null as num2,
cast(null as string) as char3,
cast(null as string) as char4
FROM `bigquery-project-XXXX.export_TblA.all _data_view`
UNION ALL
select concat(cast(row_number() over (partition by date) as string),' | ', ' TblB) as pk_client ,
char1,
null as num1,
char2,
num3,
char5,
num2,
cast(null as string) as char3,
cast(null as string) as char4
FROM `bigquery-project-XXXX.export_TblB.all _data_view`
select concat(cast(row_number() over (partition by date) as string), ' | ', ' TblC) as pk_client ,
char1,
num1,
char2,
num3,
cast(null as string) as char5,
num2,
char3,
char4
FROM `bigquery-project-XXXX.export_TblC.all _data_view`
)
Select * from mast_tbl
This works but the amount of manual work is huge especially if we are adding a new table or new column or removing a column this would make me change at each table query for UNION ALL to work. So, I was wondering if there is any way we can automate a script or any other way we should do this task.