0

I want to create a query to get the cumulative sum by month of our active customers. The tricky thing here is that (unfortunately) some customers churn and so I need to remove them from the cumulative sum on the month they leave us.

Here is a sample of my customers table :

customer_id |   begin_date  |   end_date
-----------------------------------------
1           |   15/09/2017  |   
2           |   15/09/2017  |   
3           |   19/09/2017  |   
4           |   23/09/2017  |   
5           |   27/09/2017  |   
6           |   28/09/2017  |   15/10/2017
7           |   29/09/2017  |   16/10/2017
8           |   04/10/2017  |   
9           |   04/10/2017  |   
10          |   05/10/2017  |   
11          |   07/10/2017  |   
12          |   09/10/2017  |   
13          |   11/10/2017  |   
14          |   12/10/2017  |   
15          |   14/10/2017  |

Here is what I am looking to achieve :

month   |   active customers
-----------------------------------------       
2017-09 |   7
2017-10 |   6

I've managed to achieve it with the following query ... However, I'd like to know if there are a better way.

select 
    "begin_date" as "date",
    sum((new_customers.new_customers-COALESCE(churn_customers.churn_customers,0))) OVER (ORDER BY new_customers."begin_date") as active_customers
FROM (
    select 
        date_trunc('month',begin_date)::date as "begin_date",
        count(id) as new_customers
    from customers
    group by 1
) as new_customers
LEFT JOIN(
    select 
        date_trunc('month',end_date)::date as "end_date",
        count(id) as churn_customers
    from customers
    where
        end_date is not null
    group by 1
) as churn_customers on new_customers."begin_date" = churn_customers."end_date"
order by 1
;
Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45

1 Answers1

0

You may use a CTE to compute the total end_dates and then subtract it from the counts of start dates by using a left join

SQL Fiddle

Query 1:

WITH edt
AS (
    SELECT to_char(end_date, 'yyyy-mm') AS mon
        ,count(*) AS ct
    FROM customers
    WHERE end_date IS NOT NULL
    GROUP BY to_char(end_date, 'yyyy-mm')
    )
SELECT to_char(c.begin_date, 'yyyy-mm') as month
    ,COUNT(*) - MAX(COALESCE(ct, 0)) AS active_customers
FROM customers c
LEFT JOIN edt ON to_char(c.begin_date, 'yyyy-mm') = edt.mon
GROUP BY to_char(begin_date, 'yyyy-mm')
ORDER BY month;

Results:

|   month | active_customers |
|---------|------------------|
| 2017-09 |                7 |
| 2017-10 |                6 |
Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45
  • Thanks! It's exactly the way I found (see update after your answer). I am wondering if there is a better way to achieve this. – user10026035 Jul 03 '18 at 11:49
  • @user10026035 : Although the logic is same, your query has an unnecessary construct with sum analytic function. The approach however doesn't look bad either as the design of your table and rules of SQL aggregation prevent any further optimisation or improvement in readability. I would be really interested to see if someone can come up with a better alternative. – Kaushik Nayak Jul 03 '18 at 12:35