I am working with a subscription based data set of which this is an exemplar:
import pandas as pd
import numpy as np
from datetime import timedelta
start_date = pd.date_range(start = "2015-01-09", end = "2022-09-11", freq = "6D")
cancel_date = [start_date + timedelta(days = np.random.exponential(scale = 100)) for start_date in start_date]
churned = [random.randint(0, 1) for i in range(len(start_date))]; churned = [bool(x) for x in churned]
df = pd.DataFrame(
{"start_date":start_date,
"cancel_date":cancel_date,
"churned":churned}
)
df["cancel_date"] = df["cancel_date"].dt.date
df["cancel_date"] = df["cancel_date"].astype("datetime64[ns]")
I need a way to calculate monthly customer churn in python using the following steps:
- Firstly, I need to obtain the number of subscriptions that started before the 1st of each month that are still active
- Secondly, I need to obtain the number of subscriptions that started before the 1st of each month and which were cancelled after the 1st of each month
These two steps constitute the denominator of the monthly calculation
- Finally, I need to obtain the number of subscriptions that cancelled in each month
This step produces the numerator of the monthly calculation.
The numerator and the denominator are divided and multiplied by 100 to obtain the percentage of customers that churn each month
I am really really lost with this problem can someone please point me in the right direction - I have been working on this problem for so long