How to Group By all columns of table A when joining two tables to create one big nested table on BigQuery?

Question

I'm trying to create a big nested table that would be composed of many tables, such as my Clients table, Phone Numbers, Emails... They all have in common the client_id field. For the moment I have to following query that works well ("join" the Clients table fields and the according Phone Numbers fields):

SELECT Clients.*, ARRAY_AGG( STRUCT(Timestamp, Country_Code, Local_Number, Phone_Number, Whatsapp)) as Phones
FROM Clients LEFT JOIN Phones USING(client_id)
GROUP BY Client.client_id, Clients.Timestamp, Clients.First_Name, Clients.Last_Name, Clients.DOB

Client.client_id, Clients.Timestamp, Clients.First_Name, Clients.Last_Name, Clients.DOB are all my fields in Clients table. I would like to use this query as subquery to "join" it to the Emails table in a similar way (using with and renaming the result of the subquery). The Thing is that I would like to GROUP BY all the fields of Clients table without writing them all every time. Neither GROUP BY Clients.* nor GROUP BY ALL work...

What can I do to shorten this ?

Gordon Linoff · Accepted Answer · 2021-01-21T14:38:31.953

0

If client_id is unique, then you can just aggregate by that. What you want is to get all the columns when you do that. A very BigQuery'ish approach is:

SELECT ANY_VALUE(c).*, 
       ARRAY_AGG( STRUCT(p.Timestamp, p.Country_Code, p.Local_Number, p.Phone_Number, p.Whatsapp)) as Phones
FROM Clients c LEFT JOIN
     Phones p
     USING (client_id)
GROUP BY c.client_id;

This works fine when I run it:

WITH clients as (
      select 'x' as name, 1 as client_id union all 
      select 'y' as name, 2 as client_id
    ),
    phones as (
        select current_timestamp as timestamp, 1 as client_id, 'abc' as country_code, 111 as local_number, 222 as phone_number, null as whatsapp
    )
SELECT ANY_VALUE(c).*, 
       ARRAY_AGG( STRUCT(p.Timestamp, p.Country_Code, p.Local_Number, p.Phone_Number, p.Whatsapp)) as Phones
FROM Clients c LEFT JOIN
     Phones p
     USING (client_id)
GROUP BY c.client_id;

edited Jan 21 '21 at 14:38

answered Jan 21 '21 at 12:35

Gordon Linoff

1,242,037
58
646
786

The problem is that it doesn't let me group by only by the client id. In theory it is unique but I don't see an option to define it as unique in BigQuery. – Harvey Jan 21 '21 at 13:03
@Harvey . . . Huh? I've never encountered an issue in BigQuery saying that it cannot aggregate by certain columns, whether they are unique or not . . . hmmm, can you try `group by c.client_id`? – Gordon Linoff Jan 21 '21 at 13:48
Of course I tried and in theory this would be perfect. It just doesn't let me do so... It then says _Star expansion expression references column Timestamp which is neither grouped nor aggregated at [19:8]_. And when I add this to the GROUP BY then it's the next field turn and so on... until I add all the fields to the GROUP BY – Harvey Jan 21 '21 at 14:15
@Harvey . . . It does require `c.client_id`, but the code works fine when I run it. I added example code that works. – Gordon Linoff Jan 21 '21 at 14:38
For some reasons now it works... Thank you !! – Harvey Jan 24 '21 at 07:45
The ANY_VALUE aggregation function is important. – Steve Lancashire Aug 16 '23 at 05:16

score 0 · Answer 2 · answered Jan 21 '21 at 14:51

0

Try below

SELECT ANY_VALUE(c).*, 
  ARRAY_AGG((SELECT AS STRUCT p.* EXCEPT(client_id))) as Phones
FROM Clients c 
LEFT JOIN Phones p
USING (client_id)
GROUP BY c.client_id

answered Jan 21 '21 at 14:51

Mikhail Berlyant

165,386
8
154
230

How to Group By all columns of table A when joining two tables to create one big nested table on BigQuery?

2 Answers2