1

By using window functions in Big Query, I want to calculate the average of a column based on some conditions. I use a window function in stead of a join because my actual dataset is very large and a window function speeds up my query.

I manage to take the average within a window, however I want to be able to exclude the current row and the rows in which the ID is equal to the current row from this average.

create temp table base (Date date, Id string, Input int64, Output int64);

insert base
values('2020-12-05', '123', 4, 12),
      ('2020-12-06', '456', 3, 20),
      ('2020-12-07', '657', 4, 45),
      ('2020-12-08', '789', 4, 25),
      ('2020-12-07', '789', 4, 35)
;


select  *,
        avg(Output) over (ReferenceWindow) OutputAvg,
from    base
window  ReferenceWindow as    (partition by Input 
                              order by unix_date(Date)
                              range between 1 preceding and 1 following     
                              )
order by Id
;

What I get now:
OutputAvg
12
32.5
32.5
30
30

What I want:
OutputAvg
null
null
30
45
45

Yoni
  • 11
  • 1
  • 1
    Maybe you can try the array_agg() trick then use a JS UDF to code any logic that you have in mind. https://stackoverflow.com/questions/65775512/its-possible-to-create-a-rule-in-preceding-rows-in-sum/65788758#65788758 – Yun Zhang Jan 22 '21 at 01:15
  • Thank you! This seems to be the solution indeed. However I cannot get the UDF to work properly in such a way that an average is calculated, while excluding rows where the ID is equal to the ID of the current row. Could you please provide an example of how this could be done? – Yoni Jan 22 '21 at 13:40
  • You're always fetching 3 rows right? Could you always treat the 2nd element in the array as "current row"? – Yun Zhang Jan 22 '21 at 17:59
  • You can also put your half-cooked query in the question so people could help to improve it – Yun Zhang Jan 22 '21 at 18:46

0 Answers0