2
with
  table1 as (
    select 'joe' as name, 17 as age, 25 as speed
  ),
  table2 as (
    select 'nick' as name, 21 as speed, 23 as strength
  )

select * from table1
union all
select * from table2

In Google BigQuery, this union all does not throw an error because both tables have the same number of columns (3 each). However I receive bad data output because the columns do not match. Rather than outputting a new table with 4 columns name, age, speed, strength with correct values + nulls for missing values (which would probably be preferred), the union all keeps the 3 columns from the top row.

Is there a good way to catch that the columns do not match, rather than the query silently returning bad data? Is there any way for this to return an error perhaps, as opposed to a successful table? I'm not sure how to check in SQL that the columns in the 2 tables match.

Edit: in this example it is clear to see that the columns do not match, however in our data we have 100+ columns and we want to avoid a situation where we make an error in a UNION ALL

Canovice
  • 9,012
  • 22
  • 93
  • 211
  • you always need to specify the column name in case of using union all and also column order should be same for all the datasets while using union all – Fahmi Aug 28 '20 at 17:30
  • Yes i know the column order should be the same when using `union all`. My question is whether or not there is a way to compute that the column names + orders are, in fact, the same. – Canovice Aug 28 '20 at 17:31
  • The point is: Don't use `*` as projection when you do a `UNION ALL` but explicitly list the columns. – sticky bit Aug 28 '20 at 19:45
  • I understand the best practice. I am simply asking if it's possible to catch / handle an error in the instance that the column names do not match – Canovice Aug 28 '20 at 21:38
  • 1
    are you looking for making this in "run-time" or "design-time"? in another word - is it ok, just to first compare the schema and if all kosher than in separate query to run your union all stuff? - oh, Ok - that is exactly what you just answered – Mikhail Berlyant Aug 28 '20 at 22:12
  • Initially we were looking for run-time, but then with `INFORMATION_SCHEMA` making this quite easy, we chose to save the tables and write a test that compares the two tables. – Canovice Aug 29 '20 at 21:51

2 Answers2

2

Below is for BigQuery Standard SQL and using scripting feature of BQ

DECLARE statement STRING;
SET statement = (
  WITH  table1_columns AS (
    SELECT column FROM (SELECT * FROM `project.dataset.table1` LIMIT 1) t,
    UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
  ), table2_columns AS (
    SELECT column FROM (SELECT * FROM `project.dataset.table2` LIMIT 1) t,
    UNNEST(REGEXP_EXTRACT_ALL(TRIM(TO_JSON_STRING(t), '{}'), r'"([^"]*)":')) column
  ), all_columns AS (
    SELECT column FROM table1_columns UNION DISTINCT SELECT column FROM table2_columns
  )
  SELECT (
      SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table1` UNION ALL '
      FROM all_columns a LEFT JOIN table1_columns t USING(column)
    ) || (
      SELECT 'SELECT ' || STRING_AGG(IF(t.column IS NULL, 'NULL as ', '') || a.column, ', ') || ' FROM `project.dataset.table2`'
      FROM all_columns a LEFT JOIN table2_columns t USING(column)
    ) 
);
EXECUTE IMMEDIATE statement;   

when applied to sample data from your question - output is

Row name    age     speed   strength     
1   joe     17      25      null     
2   nick    null    21      23   
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
1

After saving table1 and table2 as 2 tables in a dataset in BigQuery, I then used the metadata using INFORMATION_SCHEMA to check that the columns matched.

SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table1'

SELECT *
FROM models.INFORMATION_SCHEMA.COLUMNS
where table_name = 'table2'

INFORMATION_SCHEMA.COLUMNS returns information including the column names and their positioning. I can join these 2 tables together then to check that the names match...

Canovice
  • 9,012
  • 22
  • 93
  • 211