-1

I need to write a script in Python using which I run a query on all the columns of the table uploaded on bigquery. I understand that I can select individual columns by their names and process them like : SELECT DISTINCT column_name FROM table_name and then write a UDF in javascript to do my processing.

But, my requirement is to obtain all the column names of the schema(suppose as a list) and then one by one pass the elements of the list in the select statement like suppose the list of column names is l = [col1,col2,col3...] I want to do something like:

for i in range(0,len(l):
    SELECT DISTINCT l[i] from table_name

so that I do not need to hard code the column names of the schema in my code. How can I do this bigquery using standard sql? or is it not possible and I need to pass my entire dataset into javascript UDF and then do all my processing there?

Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99

1 Answers1

0

There is no way currently to apply a user-defined function to each column in a table separately without enumerating all of them. One idea is to do row-based processing instead, though that still requires listing the column names and types as part of the function definition. For example:

#standardSQL
CREATE TEMP FUNCTION ProcessRow(t STRUCT<x FLOAT64, y STRING, z BOOL>)
RETURNS STRUCT<x FLOAT64, y STRING, z BOOL> LANGUAGE js AS """
  function ProcessColumn(x) {
    // (Some processing here)
    return x;
  }

  var new_t = new Object();
  for (var property in t) {
    if (t.hasOwnProperty(property)) {
      new_t[property] = ProcessColumn(t[property]);
    }
  }
  return new_t;
""";

WITH YourTable AS (
  SELECT 1 AS x, 'foo' AS y, true AS z
)
SELECT ProcessRow(t).*
FROM YourTable t;

This would be simpler if BigQuery supported templated functions, which you can star on the issue tracker to register your interest.

Elliott Brossard
  • 32,095
  • 2
  • 67
  • 99