0

I have a big query result which shows the weight of each person over many weeks, and I want to find the names of the people whose weight always increased over the weeks. Below is the a sample data.

    name        week        weight
    tom         2020W10     76
    tom         2020W09     75
    tom         2020W08     74
    jane        2020W10     65
    jane        2020W09     65
    jane        2020W08     64

So in the case of the above sample data, I would only want to see Tom in my result since his weight always increased over the dates. In my original data set, I have 10,000 names and more than 10 weeks, so it is really hard do to do with a more brute force approach. I was thinking of using the the regression function in Big query, but then I would have to do as many regressions as I have unique names. I would really appreciate any help. Thanks!

1 Answers1

1

One method uses lag() and aggregation:

with t AS (
      SELECT 'tom' AS name, '2020W10' AS week, 76 AS weight UNION ALL
      select 'tom' , '2020W09' , 75 UNION ALL
      select 'tom' , '2020W08' , 74 UNION ALL
      select 'jane' , '2020W10' , 65 UNION ALL
      select 'jane' , '2020W09' , 65 UNION ALL
      select 'jane' , '2020W08' , 64
    )
select t.name
from (select t.*, lag(weight) over (partition by name order by week) as prev_weight
      from t
     ) t
group by t.name
having countif(prev_weight >= weight) = 0 ;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • hmm, this seems to select both tom and jane while I only want tom to be in the result. do you have any idea why? Also can this work if we dont always have the same week numbers for each name , and also we there are leaps in the weeks numbers such as moving from 2020W08 to 2020W10 ( skipping the 9th week) ? I have put the code in the following comment for you to be able to reproduce it . – Savybossman Apr 20 '20 at 13:10
  • with t AS ( SELECT 'tom' AS name, '2020W10' AS week, 76 AS weight UNION ALL select 'tom' , '2020W09' , 75 UNION ALL select 'tom' , '2020W08' , 74 UNION ALL select 'jane' , '2020W10' , 65 UNION ALL select 'jane' , '2020W09' , 65 UNION ALL select 'jane' , '2020W08' , 64 ) select t.name from (select t.*, lag(weight) over (partition by name order by week) as prev_weight from t ) t group by t.name having countif(prev_weight <= 0) = 0 – Savybossman Apr 20 '20 at 13:13