I have already seen all the related posts, but none have been able to help me.
I Have the following fields:
Where:
- SOLD_AT is the date of each transaction
- CUSTOMER_ID is a unique ID for each customer
- COHORT is the date (Year-Month) of the first purchase of the user in that row
- ORDER_MONTH is the date of (Year-Month) of the purchase in that row
- PERIOD_NUMBER is the date difference in months between COHORT and ORDER_MONTH
- N_CUSTOMERS is the number of customers in each PERIOD_NUMBER in each COHORT
In case is useful, I have the querys with which I have obtained these fields, but I think that including them would only add noise since the definition of each variable is more useful.
What I need to do and am not able to do is add an additional field for the retention of each period number of each cohort (not a pivot table by adding the period numbers of each cohort).
Specifically, I need the retention of each period number to be the division of the number of users of that period by the number of users of the previous period, in this way:
To do this in Python, I simply do:
cohort_pivot = df_cohort.pivot_table(index = 'cohort',
columns = 'period_number',
values = 'n_customers')
cohort_size = cohort_pivot.iloc[:,0]
retention_matrix1 = cohort_pivot.divide(cohort_size, axis = 0)
and I can then unpivot and take out the retention for each period of each cohort to create an additional column with this value.
One of the answers that I tried because it was the closest thing I saw was the answer chosen in this post, but I am not able to know the number of periods_numbers or historical months that I am going to have since the code has to be dynamic for any company that is loaded (For example, in DBT, which is the tool I'm using, you can create dynamic pivot tables instead of static ones that require to know this information, but as I say I need to create the field, not the pivot table)
Any ideas will be more than welcome, thank you very much