1

Is there a simple way to delete a STRUCT from the nested and repeated field in the BigQuery (BQ table column Type: RECORD, Mode: REPEATED).

Let's say I have the following tables:

  1. wishlist
name    toy.id  toy.priority
Alice   1       high
        2       medium
        3       low
Kazik   3       high
        1       medium
  1. toys
id  name    available
1   car     0
2   doll    1
3   bike    1

I'd like to DELETE from wishlist toys that are not available (toys.available==0). In this case, it's toy.id==1.

As a result, the wishlist would look like this:

name    toy.id  toy.priority
Alice   2       medium
        3       low
Kazik   3       high

I know how to select it:

WITH `project.dataset.wishlist` AS
(
  SELECT 'Alice' name, [STRUCT<id INT64, priority STRING>(1, 'high'), (2, 'medium'), (3, 'low')] toy UNION ALL
  SELECT 'Kazik' name, [STRUCT<id INT64, priority STRING>(3, 'high'), (1, 'medium')]
), toys AS (
  SELECT 1 id, 'car' name, 0 available UNION ALL
  SELECT 2 id, 'doll' name, 1 available UNION ALL
  SELECT 3 id, 'bike' name, 1 available
)
SELECT wl.name, ARRAY_AGG(STRUCT(unnested_toy.id, unnested_toy.priority)) as toy
FROM `project.dataset.wishlist` wl, UNNEST (toy) as unnested_toy
LEFT JOIN toys t ON unnested_toy.id=t.id
WHERE t.available != 0
GROUP BY name

But I don't know how to remove structs <toy.id, toy.priority> from wishlist when toys.available==0.

There are very similar questions like How to delete/update nested data in bigquery or How to Delete rows from Structure in bigquery but the answers are either unclear to me in terms of deletion or suggest copying the whole wishlist to the new table using the selection statement. My 'wishlist' is huge and 'toys.availabililty' changes often. Copying it seems to me very inefficient.

Could you please suggest a solution aligned with BQ best practices?

Thank you!

Lukiz
  • 175
  • 1
  • 9
  • 1
    BigQuery is not very efficient with frequently changing data. It is mostly designed for append-only data tables. Maybe a better fit would be Bigtable instead. – Sergey Geron Dec 24 '20 at 13:21
  • the query in the question does exactly what you ask for - what the problem here? please clarify!! If you meant physically delete those structs in source table - then it does not make much sense what if next time that removed from wishlist toy will become available?! anyway - please clarify what exactly you want to accomplish – Mikhail Berlyant Dec 24 '20 at 15:46
  • @MikhailBerlyant please do not hold onto the data that much. Knowing how to delete certain structs would help me a lot. It's because in the real scenario in the `wishlist` I have predictions based on different ML models. Structs contain references to prediction groups/descriptions (that are in the `toys` list) that a sample belongs to. But some models are refreshed in certain conditions and a as result some of those ID's assignments are invalid. So certain Id should be removed from structs beforehand. They are incorrect, slowing down queries, and has a non-zero carbon footprint :) Thanks – Lukiz Dec 26 '20 at 17:36
  • Thanks @SergeyGeron. I was afraid that it will be the conclusion since indeed It's one of the BQ design principles. However since row Deletion was implemented in BQ, I thought that STRUCT deletion inside a row is also possible. – Lukiz Dec 26 '20 at 17:53

2 Answers2

2

... since row Deletion was implemented in BQ, I thought that STRUCT deletion inside a row is also possible.

You can use UPDATE DML for this (not DELETE as it is used for deletion of whole row(s), while UPDATE can be used to modify the row)

update `project.dataset.wishlist` wl
set toy = ((
  select array_agg(struct(unnested_toy.id, unnested_toy.priority)) 
  from unnest(toy) as unnested_toy
  left join `project.dataset.toys` t on unnested_toy.id=t.id
  where t.available != 0
))
where true;   
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

You can UNNEST() and reaggregate:

SELECT wl.name, 
       (SELECT ARRAY_AGG(t)
        FROM UNNEST(wl.toy) t JOIN
             toys
             ON toys.id = t.id
        WHERE toys.available <> 0
       ) as available_toys
FROM `project.dataset.wishlist` wl;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks Gordon but this is not a deletion operation. In my scenario I'd like to delete some STRUCTs because at certain point they can get erroneous. If DELETION is not feasible the most logical solution to me is probably UPDATING those rows with a modified ARRAY of STRUCTS. What do yoh think? – Lukiz Dec 26 '20 at 18:24
  • @Lukiz . . . I don't understand your comment. This is not "deleting" anything. It is returning a new result set with only the available toys in the results. – Gordon Linoff Dec 26 '20 at 19:00
  • Yes, that's true. But please read the title and the question itself. – Lukiz Dec 26 '20 at 20:02