0

I have two tables, and would like to use them to help analyze churn.

The first table has subscription data per customer:

ACCT    INVOICE_DATE  REVENUE   FIRST_INVOICE   LAST_INVOICE
1234    2021-09-01    10        2021-09-01      2021-12-01
1234    2021-12-01    10        2021-09-01      2021-12-01
5678    2021-06-01    20        2021-06-01      2021-08-01
5678    2021-07-01    20        2021-06-01      2021-08-01
5678    2021-08-01    20        2021-06-01      2021-08-01

The other is a convenience table with all months since 1970, e.g.,

MONTH
1970-01-01
1970-02-01
[ ... ]
2023-02-01
2023-03-01

Note one customer in this example skipped a month, and both customers did not survive to the current year. To accurately deal with these churn events (there can be multiple over the lifetime of the account), I'd like to end up with something like this:

ACCT    INVOICE_DATE  REVENUE   FIRST_INVOICE   LAST_INVOICE
1234    2021-09-01    10        2021-09-01      2021-12-01
1234    2021-10-01    NULL      2021-09-01      2021-12-01
1234    2021-11-01    NULL      2021-09-01      2021-12-01
1234    2021-12-01    10        2021-09-01      2021-12-01
1234    2022-01-01    NULL      2021-09-01      2021-12-01
5678    2021-06-01    20        2021-06-01      2021-08-01
5678    2021-07-01    20        2021-06-01      2021-08-01
5678    2021-08-01    20        2021-06-01      2021-08-01
5678    2021-09-01    NULL      2021-06-01      2021-08-01

You can assume the date types for the months in each table are equivalent.

There are other options that I'm exploring with window functions, but I've used this approach in Pandas before and curious how I might translate that to SQL.

measureallthethings
  • 1,102
  • 10
  • 26

1 Answers1

0

This seems to work. I created a months table like yours, first. Then, using your sample data in a CTE called customers, I first create a table that just has a distinct set of customers with your first and last invoice date, do a left join on first/last invoice date + 1 extra month. Then, take that result and left join it back to the original customer data.

with customers as (
    select *
    from values(1234,    '2021-09-01'::date,    10,        '2021-09-01'::date,      '2021-12-01'::date),
    (1234,    '2021-12-01'::date,    10,        '2021-09-01'::date,      '2021-12-01'::date),
    (5678,    '2021-06-01'::date,    20,        '2021-06-01'::date,      '2021-08-01'::date),
    (5678,    '2021-07-01'::date,    20,        '2021-06-01'::date,      '2021-08-01'::date),
    (5678,    '2021-08-01'::date,    20,        '2021-06-01'::date,      '2021-08-01'::date) x (ACCT,INVOICE_DATE,REVENUE,FIRST_INVOICE,LAST_INVOICE)
), x as (
    select distinct acct, first_invoice, last_invoice
    from customers
), y as (
    select *
    from testing.months m
    left join x
     on m.month between x.first_invoice and dateadd(month,1,x.last_invoice)
    where x.acct is not null
    )
select y.acct, y.month as invoice_date, c.revenue, y.first_invoice, y.last_invoice
from y
left join customers c
  on y.acct = c.acct
 and y.month = c.invoice_date
order by 1,2
;
Mike Walton
  • 6,595
  • 2
  • 11
  • 22