I'm running out of memory with my query on Google BigQuery. I have to calculate multiple window functions like running sums over multiple different time frames. My data mainly consists of an id (string), a value (number), a type ('in' or 'out', could be converted to bool is needed) and a timestamp.
I read that there is no way to increase memory per slot, so the only way to be able to execute the query is to cut it into smaller pieces that can be sent to different slots. A way to do this is to use GROUP BY
or OVER (PARTITION BY ...)
but I have no idea how I could rewrite my query to make use of it.
I have some calculations that need to use PARTITION BY
but for others, I want to calculate the total overall, for example:
Imagine a have a large table (> 1 billion rows) where I want to calculate a rolling sum over all values for different time frames, independent of id.
WITH data AS (
SELECT *
FROM UNNEST([
STRUCT
('A' as id,1 as value, 'out' as type, 1 as time),
('A', -1, 'in', 2),
('B', 2, 'out', 2),
('C', 1, 'out', 3),
('B', -1, 'in', 4),
('A', 2, 'out', 4),
('C', 5, 'out', 5),
('B', 3, 'out', 6),
('A', 1, 'out', 6),
('A', -4, 'in', 6),
('C', -3, 'in', 7)
])
)
SELECT
id
, value
, type
, time
, SUM(value) OVER (ORDER BY time RANGE UNBOUNDED PRECEDING) as total
, SUM(value) OVER (ORDER BY time RANGE BETWEEN 1 PRECEDING AND CURRENT ROW) as total_last_day
, SUM(value) OVER (ORDER BY time RANGE BETWEEN 3 PRECEDING AND 2 PRECEDING) as total_prev_day
FROM data
How could I split this query to make use of PARTITION BY
or GROUP BY
in order to fit within the memory limits?