Hello again stackoverflow-ers ! hope you are well
I am working on a project and am essentially trying to create a decision tree. The data is a for a bank's campaign concerning how well the campaign incentivized the customers to open up a term deposit.
Anyhow, i've worked through coding etc with some assistance from online resources and hit the wall on one part.
One of the columns is the term deposit amout figure for all customers and as I plotted the data to visualize it (please see attached the plot)
Since the data is so dispersed i wanted to discretize it. I used the following code:
BankTraining$TDepositAMTD<-cut(BankTraining$TermDepositAMT, right=F,
breaks= c(0,5000,10000,15000,20000,max(BankTraining$TermDepositAMT)))
here
The Y axis is the number of observations and X axis is the dollar amount of term deposits.
However, viewing the column after this step i see :
table(BankTraining$TDepositAMTD)
[0,5e+03) [5e+03,1e+04)
5213 8631
[1e+04,1.5e+04) [1.5e+04,2e+04)
8367 1698
[2e+04,3e+04)
3121
Now, clearly this is no good. Once the decision tree is created it shows these weird categories which I cannot interpret.
Could someone shed light on this issue please? Much gratitude for your help.