2

I have a SQL table (actually a BigQuery table) that has a huge number of columns (over a thousand). I want to quickly find the min and max value of each column. Is there a way to do that?

It is impossible for me to list all the columns. Looking for ways to do something like

SELECT MAX(*) FROM mytable;

and then running

SELECT MIN(*) FROM mytable;

I have been unable to Google a way of doing that. Not sure that's even possible.

For example, if my table has the following schema:

col1  col2  col3  .... col1000

the (say, max) query should return

Row  col1  col2 col3  ... col1000
1    3     18   0.6   ... 45

and the min query should return (say)

Row  col1  col2 col3  ... col1000
1    -5     4   0.1   ... -5

The numbers are just for illustration. The column names could be different strings and not easily scriptable.

Nahuel Varela
  • 1,022
  • 7
  • 17
Nik
  • 5,515
  • 14
  • 49
  • 75
  • are they all numeric? or string? or mixed? – Mikhail Berlyant Jan 28 '19 at 23:48
  • Does it need to be dynamic or would static sql work? I've never used bigquery, but in something like SQL Server of DB2 I'd look at a query against SYSCOLUMNS for the tablename where I concatenate strings to the column names like `, "MIN(" || NAME || "), MAX(" || NAME || ") "` or something like that and then just copy/paste the output into the query. Does BigQuery give you access to that kind of table definition data? – Ben Jan 28 '19 at 23:54
  • The columns are all numeric @MikhailBerlyant – Nik Jan 28 '19 at 23:58
  • got it - see the answer – Mikhail Berlyant Jan 28 '19 at 23:59
  • The columns names are dynamic in nature, although they don't change very often. Once every week I would say. I could generate the SQL using a script and run it but it might run into trouble if the column names change. A dynamic way of handling this would have been perfect. @Ben – Nik Jan 29 '19 at 00:00
  • 1
    Are you able to us the API? If so you could write a simple script that will output all the columns for your table. You could then iterate through each column, query for the MAX() and MIN() save your output and load it to BQ. The advantage here woud be that this solution would work for any count of columns. – Teddy Jan 29 '19 at 00:44

1 Answers1

4

See below example for BigQuery Standard SQL - it works for any number of columns and does not require explicit calling/use of columns names

#standardSQL
WITH `project.dataset.mytable` AS (
  SELECT 1 AS col1, 2 AS col2, 3 AS col3, 4 AS col4 UNION ALL
  SELECT 7,6,5,4 UNION ALL
  SELECT -1, 11, 5, 8
)
SELECT 
  MIN(CAST(value AS INT64)) AS min_value, 
  MAX(CAST(value AS INT64)) AS max_value
FROM `project.dataset.mytable` t, 
UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value  

with result

Row min_value   max_value    
1   -1          11    

Note: if your columns are of STRING data type - you should remove CAST ... AS INT64
Or if they are of FLOAT64 - replace INT64 with FLOAT64 in the CAST function

Update

Below is option to get MIN/Max for each column and present result as array of respective values as list of respective values in the order of the columns

#standardSQL
WITH `project.dataset.mytable` AS (
  SELECT 1 AS col1, 2 AS col2, 3 AS col3, 14 AS col4 UNION ALL
  SELECT 7,6,5,4 UNION ALL
  SELECT -1, 11, 5, 8
), temp AS (
  SELECT pos, MIN(CAST(value AS INT64)) min_value, MAX(CAST(value AS INT64)) max_value
  FROM `project.dataset.mytable` t, 
  UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'":(.*?)(?:,"|})')) value WITH OFFSET pos
  GROUP BY  pos
)
SELECT 'min_values' stats, TO_JSON_STRING(ARRAY_AGG(min_value ORDER BY pos)) vals FROM temp UNION ALL
SELECT 'max_values', TO_JSON_STRING(ARRAY_AGG(max_value ORDER BY pos))  FROM temp 

with result as

Row stats       vals     
1   min_values  [-1,2,3,4]   
2   max_values  [7,11,5,14]    

Hope this is something you can still apply to whatever your final goal

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
  • But this doesn't give min max of each column. Sorry if the question wasn't clear. I need to get min and max value of each column. So, the query should return result like `Row col1 col2 col3 ...` and under each column it should display the min value (assuming we ran query to get only the min value). – Nik Jan 29 '19 at 00:06
  • based om your `SELECT MAX(*) FROM mytable;` I thought you are looking for MAX and MIN across all column. But looks like you are looking for max, min for each and every column. I see now - i will look further and will update my answer for this – Mikhail Berlyant Jan 29 '19 at 00:08
  • Sorry if that was misleading. I have posted examples to illustrate what I am looking for. Please let me know if that doesn't even make sense or if there isn't a dynamic way of doing that. – Nik Jan 29 '19 at 00:10
  • your goal is clear to me now - i will look into it more and get back to this - hope someone else from SO community also will contribute with their answers :o – Mikhail Berlyant Jan 29 '19 at 00:24
  • 1
    @Teddy - hi nice to see you :o) You meant to make this comment for OP, not for me - right? – Mikhail Berlyant Jan 29 '19 at 00:41
  • 1
    @MikhailBerlyant nice to see you too. :) Correct, my bad. Just posted it to the OP. – Teddy Jan 29 '19 at 00:45
  • 1
    @Teddy - :o) client scripting is too obvious here and hopefully someone will post the respective answer - I am more interested in finding pure bq script solution - as you can see from most of my answers / posts – Mikhail Berlyant Jan 29 '19 at 00:47
  • Pure bq script would be an interesting solution. :) – Teddy Jan 29 '19 at 01:04
  • @Nik - glad it worked for you. consider also voting up – Mikhail Berlyant Jan 29 '19 at 01:23