0

I am working to enhance a dataset by creating a column that would allow me to track how many active quarters a given company has had for a given row. A company is "active" if they recognize revenue within that quarter. Each row of my dataset represents one month's performance for a single company.

I have been able to use a WINDOW function to create a running sum for active months successfully:

COUNTIF(Revenue IS NOT NULL) OVER 
(partition by Company_Name ORDER BY month_end ASC ROWS BETWEEN unbounded preceding and current row) AS cumulative_active_months

I am now struggling to convert my logic to count the quarters rather than the months.

This is a rough idea of what my table currently looks like.

  Row   Month   Month_end    Fiscal_Quarter   Company_Name   Revenue   Active month count  
 ----- ------- ------------ ---------------- -------------- --------- -------------------- 
  1     Jul     2016-07-31   FY17-Q2          Foo            x,xxx     1                   
  2     Jul     2016-07-31   FY17-Q2          Bar            xxx,xxx   1                   
  3     Aug     2016-08-31   FY17-Q2          Foo            xx,xxx    2                   
  4     Aug     2016-08-31   FY17-Q2          Bar            xxx       2                   
  5     Sep     2016-09-30   FY17-Q2          Foo            xx        3                   
  6     Sep     2016-09-30   FY17-Q2          Bar            x,xxx     3                   
  7     Oct     2016-10-31   FY17-Q3          Foo            xx        4                   
  8     Oct     2016-10-31   FY17-Q3          Bar            Null      3                 

This what ideally I'd like for my table to look like.

  Row   Month   Month_end    Fiscal_Quarter   Company_Name   Revenue   Active month count   Active quarter count  
 ----- ------- ------------ ---------------- -------------- --------- -------------------- ---------------------- 
  1     Jul     2016-07-31   FY17-Q2          Foo            x,xxx     1                    1                     
  2     Jul     2016-07-31   FY17-Q2          Bar            xxx,xxx   1                    1                     
  3     Aug     2016-08-31   FY17-Q2          Foo            xx,xxx    2                    1                     
  4     Aug     2016-08-31   FY17-Q2          Bar            xxx       2                    1                     
  5     Sep     2016-09-30   FY17-Q2          Foo            xx        3                    1                     
  6     Sep     2016-09-30   FY17-Q2          Bar            x,xxx     3                    1                     
  7     Oct     2016-10-31   FY17-Q3          Foo            xx        4                    2                     
  8     Oct     2016-10-31   FY17-Q3          Bar            Null      3                    1                     
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786

1 Answers1

0

If this is counting active months:

COUNTIF(Revenue IS NOT NULL) OVER (PARTITION BY Company_Name  ORDER BY month_end ASC) AS cumulative_active_months

Then this is the corresponding count for quarters would use COUNT(DISTINCT):

COUNT(DISTINCT CASE WHEN Revenue IS NOT NULL THEN Fiscal_Quarter END)  OVER (PARTITION BY Company_Name ORDER BY month_end ASC) AS cumulative_active_quarters

Unfortunately, BigQuery does not support this, so you can use a subquery and cumulative sum:

select t.* except (seqnum),
       countif(seqnum = 1) over (partition by company_name order by month_end) as cnt
from (select t.*,
             (case when revenue is not null
                   then row_number() over (partition by Company_Name, Fiscal_Quarter order by month_end)
                   else 0
              end) as seqnum
      from t
     ) t;

Note: This does not count the current quarter until there is revenue, which I think makes sense.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you! For my understanding, what is the purpose of seqnum? When I run your code in my larger BigQuery dataset, two columns are generated, one named seqnum and another called "f0_", on initial examination it seems that this f0_ column actually has the count that I was trying to solve for in my question. Where is this new column being generated? Is there a way to rename it? – Daniel Moll Sep 20 '20 at 18:24
  • @DanielMoll . . . It is easy to remove columns in BigQuery using `except`. `seqnum` is needed to avoid counting duplicates. – Gordon Linoff Sep 20 '20 at 21:33