I am developing a combined-algorithm model in MS Azure Machine Learning Studio that should be able to predict whether a Telco-customer churn or not. Given that I have 19 variable features, e.g. monthly fee, usage length etc., how do I combine two useful algorithms? And how do I know these provide the highest Accuracy (Highest possible Accuracy needed), ie. which elements do I yet need to add and how should I use for predicting churn behaviour onto another dataset of a "fresh" set of customers?
I have: 1) Edit Metadata Having excluded the User_ID variable I have used the Edit Metadata element to label the Attrition variable (Attrition is whether a customer has churned, i.e. Yes or No). Simultaneously I have transformed Attrition into a categorical variable, specifying that the selected values should be treated in two categories, i.e. Yes or No.
Normalize data Since the three identified numerical variables (Usage_Length, Monthly_Fee and Total_Fee) a quite different in scale, e.g. Max(Monthly_Fee) is at 78,80 while Max(Total_Fee) is at 5.789,87, I have normalized Monthly_Fee and Total_Fee using the LogNormal Transformation method.
Edit Metadata (2nd use) Having normalized two of the numerical values, I have made all non-numerical features, e.g. User_Gender, Is_Senior etc., into categorical values to make them useful for the coming analysis.
Split data Once the above steps have been carried out, I have made a testing/training split of 0.2 and 0.8, respectively on which I run the models.
Choice of algorithm I have selected Two-class Boosed Decision Tree and Two-Class Decision Forest as they provide the highest possible individual Accuracy; 0.963 and 0.967, respectively.
No coding used - only elements added.
I expect the highest possible Accuracy, currently at 0.967 when combining the models into an Evaluation elementCurrent Model Screenshot