1

I have been trying to make a UDF in BigQuery to compress multiple rows into a single row

CREATE OR REPLACE FUNCTION function_name(to_compress_column INT64, order_by_column INT64) AS (
    TO_JSON_STRING(
           ARRAY_AGG(
                IFNULL(to_compress_column,-1) RESPECT NULLS
                ORDER BY order_by_column
                    )
                  )
    );

But this causes the error "ARRAY_AGG not allowed in SQL function body". I tried using SELECT and UNNEST but since my input is not an ARRAY, it did not work. How should I proceed?

Edit:

My input tables have this format:

visits place_id date hour
23 abc123 2022-01-01 4
20 abc123 2022-01-01 2
19 abc123 2022-01-01 3
24 abc123 2022-01-01 1
26 abc123 2022-01-01 5
18 abc456 2022-01-01 2
20 abc456 2022-01-01 3
17 abc456 2022-01-01 1

I want my output tables to have this format:

visits place_id date
[24,20,19,23,26] abc123 2022-01-01
[17,18,20] abc456 2022-01-01

I understand that I can do this by

SELECT TO_JSON_STRING(ARRAY_AGG(IFNULL(visits,-1) RESPECT NULLS
  ORDER BY hour)) visits, 
  place_id, 
  date 
  from input_table
  group by place_id, date

But in order to avoid repetition of the first line of the select query as there a lot of tables I have to do this in, I want to create a UDF so that the query changes to:

SELECT function_name(visits,hour) visits, 
      place_id, 
      date 
      from input_table
      group by place_id, date
  • 1
    [What is the XY problem?](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378) - I think yo are in that trap – Mikhail Berlyant Sep 02 '22 at 16:59
  • @MikhailBerlyant I see, I have edited my question to add perspective on what I am trying to achieve through the function approach – Kartikey Mullick Sep 02 '22 at 17:40
  • 1
    what you want is actually called aggregate UDF and BigQuery does not support such (at least yet)! so udf direction looks like dead end here . will think more on this ... – Mikhail Berlyant Sep 02 '22 at 21:15

1 Answers1

2

what you want is actually called aggregate UDF and BigQuery does not support such (at least yet)! so aggregate udf direction looks like dead end here

What I see as an option is described in below example

create temp function func_name (arr any type) as (( 
  select to_json_string(array_agg(ifnull(visits,-1) respect nulls order by hour))
  from unnest(arr) 
));
select func_name(array_agg(struct(visits, hour))) visits, 
  place_id, 
  date 
  from input_table
  group by place_id, date      

if applied to sample data in your question - output is

enter image description here

So, it is not 100% of what you wanted , but very close in my mind
So, instead of desired line

function_name(visits,hour) visits,    

you will have to use

function_name(array_agg(struct(visits, hour))) visits,

I feel like this is reasonable trade-off :o)

P.S. And, obviously, instead of temp function you will create permanent function ...

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230