0

I am running a For loop to gather a table_expression from a dataset's information schema that contains the names of columns that are of ARRAY type in a table. I am able to generate this table_expression as intended. From here I am trying to loop through the columns and evaluate each column's contents. The columns all contain an Array of Strings with "empty" items having a single whitespace.

 FOR field IN (
  SELECT
    field_path, data_type
  FROM
    `project.dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`
  WHERE
    table_name = 'table_1'
    AND data_type LIKE 'ARRAY%' ) DO
  SELECT
    ARRAY_LENGTH(field.field_path) AS s3,
  IF
    (REGEXP_CONTAINS(ARRAY_TO_STRING(field.field_path, ','),r'.*?\,\s\,.*?'),1,0) AS b1,
  IF
    (REGEXP_CONTAINS(ARRAY_TO_STRING(field.field_path, ','),r'^\s\,.*?'),1,0) AS b2,
  IF
    (REGEXP_CONTAINS(ARRAY_TO_STRING(field.field_path, ','),r'.*?\,\s$'),1,0) AS b3
  FROM
    `project.dataset.table_1`);
END
  FOR;

When I run this query I receive the error:

Query error: No matching signature for function ARRAY_LENGTH for argument types: STRING. Supported signature: ARRAY_LENGTH(ARRAY)

If I isolate the query after the DO, the query runs without this error and processes as intended.

I have run a similar loop to produce null counts on columns and had success, so it seems to be an issue with evaluating Arrays but I am unsure of what is causing it.

I have tried running the same query, treating field.field_path as a String value, but I do not receive the same results as when I run the DO query stand-alone and I believe it might be evaluating the field.field_path as a string value and not a variable.

Paul T.
  • 4,703
  • 11
  • 25
  • 29

1 Answers1

0

I believe it might be evaluating the field.field_path as a string value and not a variable.

Since field is a variable, it can't be used as a column name. Instead, you can consider a dynamic SQL like below.

For example,

FOR field IN (
  SELECT
    field_path, data_type
  FROM
    `project.dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS`
  WHERE
    table_name = 'table_1'
    AND data_type LIKE 'ARRAY%' 
) DO
EXECUTE IMMEDIATE FORMAT("""
  SELECT
    ARRAY_LENGTH(%s) AS s3
  FROM
    `project.dataset.table_1`
""", field.field_path);
END FOR;
Jaytiger
  • 11,626
  • 2
  • 5
  • 15
  • Can you explain why I was able to run this similar query with it evaluating the variable as a column name? – de_gcp_generic Mar 06 '23 at 15:45
  • BEGIN FOR field IN ( SELECT field_path FROM `project.dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS` WHERE table_name = 'table_name' AND data_type NOT LIKE 'STRUCT%' ) DO INSERT INTO dataset.profiling_table (field_path,count_not_null,count_null,percent_null,table_name) SELECT field.field_path, COUNTIF(field.field_path IS NOT NULL), COUNTIF(field.field_path IS NULL),(COUNTIF(field.field_path IS NULL)) / (COUNTIF(field.field_path IS NOT NULL) + COUNTIF(field.field_path IS NULL)), "table_name" FROM dataset.table_name; END FOR; END; – de_gcp_generic Mar 06 '23 at 15:45
  • *it evaluating the variable as a column name* --> it will show you column name as a string, not value in the column of the table. – Jaytiger Mar 06 '23 at 16:00
  • It appears you are confusing with `select 'column_name' from table` AND `select column_name from table`. – Jaytiger Mar 06 '23 at 16:05
  • I'm still a bit confused on the difference. In both cases the variable value acts as a value for an operation. In my example in the comments, when I am trying to evaluate field.field_path IS NOT NULL it evaluates for the value in the column, not the column name. When I do the same on my initial example it evaluates for the column name as a string. – de_gcp_generic Mar 06 '23 at 19:23
  • `field.field_path IS NOT NULL` --> if *field_path* comes from INFORMATION_SCHEMA, you're comparing a column name with `NULL`. would you try `FOR field IN (...) DO SELECT field.field_path FROM your_table LIMIT 10; END FOR;` in your script and show the result of it ? I think it will return same column name 10 times if `your_table` has enough rows over 10. – Jaytiger Mar 06 '23 at 22:57