0

this is my first time asking a question here, so if I'm doing something wrong please guide me to the right place. I have a big and clean dataset. (29000+ , 24). The thing is that I have to calculate the churn rate based on 4 different categorical columns, and I'm given just 1 column that contains the subs for a given period. I have a date column too. My idea on calculating the churn is to do

churn_rate= (Sub_start_period-Sub_end_period)/Sub_start_period*100 

The Problem

I don't know how to group the data using these 4 different categorical variables. Also If I manage to do so I would end up with more than 200 different tables, so I don't believe this would be a good approach.

My goal is able to predict the churn rate using the information in the table but I should be able to determine the churn rate based on these variables. The churn is not given, it has to be calculated, so I'm having problems here as I can't think of a way of working through this.

  • Without more details, the question is hard to follow. What does every row represent in your dataset? Which are the categorical columns? – simon May 15 '21 at 23:59
  • Welcome to Stack Overflow! Please [edit](https://stackoverflow.com/posts/67551944/edit) your post and make it focused on one specific programming-related question. It seems like your main issue is "I don't know how to group the data using these 4 different categorical variables," but we don't know what your end result needs to be and and there seems to be information in your post not directly tied to that question. Please provide a summary of these variables and show what your desired output is. See [How to Ask](https://stackoverflow.com/questions/how-to-ask) if you want more tips. – AlexK May 16 '21 at 01:03

0 Answers0