I have a huge dataset with subscriptions (ongoing ones and terminated ones) and I want to use the KaplanMeierFitter to predict for each cohort(=month in which subscription started) how many subscriptions will still be ongoing/active in the next month.
As a first step I used all data - regardless in which month a subscription started - to plot the overall KPE-Survival-Curve which show's the likelihood of a subscription still being active (left chart). Next, I plotted the survival curve for one specific cohort that is currently at month 18 (right chart).
Now I somehow want to use the data from the main survival curve (left chart) to figure out how many subscriptions will still be active in month 19 for this specific cohort but I don't really get my head around it ...
There are two approaches I considered so far:
- My initial idea was to take the multiplied likelihood of month 19 from the main chart/curve and multiply it with the total number of all subscriptions of this specific cohort. But what if the overall likelihood for months 19 is 88%, but I know this specific cohort is already at 70% at the end of month 18?
- I call the event_table and divide "observed" (=ended subscriptions) by "at_risk" (=subscriptions that were active in the beginning of the month). This should give me the likelihood of a subscription termination for each month of this specific cohort?
Im really curious for feedback and I hope all this makes somehow sense to you guys (and girls) ...