0

Hello again stackoverflow-ers ! hope you are well

I am working on a project and am essentially trying to create a decision tree. The data is a for a bank's campaign concerning how well the campaign incentivized the customers to open up a term deposit.

Anyhow, i've worked through coding etc with some assistance from online resources and hit the wall on one part.

One of the columns is the term deposit amout figure for all customers and as I plotted the data to visualize it (please see attached the plot)enter image description here

Since the data is so dispersed i wanted to discretize it. I used the following code:

BankTraining$TDepositAMTD<-cut(BankTraining$TermDepositAMT, right=F,
                           breaks= c(0,5000,10000,15000,20000,max(BankTraining$TermDepositAMT)))

here

The Y axis is the number of observations and X axis is the dollar amount of term deposits.

However, viewing the column after this step i see :

    table(BankTraining$TDepositAMTD)

   [0,5e+03)   [5e+03,1e+04) 
           5213            8631 
[1e+04,1.5e+04) [1.5e+04,2e+04) 
           8367            1698 
  [2e+04,3e+04) 
           3121

Now, clearly this is no good. Once the decision tree is created it shows these weird categories which I cannot interpret.

enter image description here

Could someone shed light on this issue please? Much gratitude for your help.

Freewill
  • 413
  • 2
  • 6
  • 18

1 Answers1

0

Since it seems you are not happy with the cuts you are producing, have a go at it with:

library(Hmisc)
Groups <- cut2(data, g = 5) # g is the number of groups or levels I want

The package Hmisc can be found here.

As for your weird categories, we would need to see what packages/ algorithms along with how you call it as these categories may be a product of your binning and some consequence of default behavior. Happy to edit when more information is available.

theWanderer4865
  • 861
  • 13
  • 20
  • Thank you 'theWanderer4865'. This worked perfectly fine. I'm still working on the code and it looks like the tree i got earlier (the one i posted) was very likely inaccurate to begin with. I'm attempting a few different things at the moment and hopefully it may altogether resolve the issue. I'll post back if the issue persists !!! Thanks alot – Freewill May 03 '14 at 01:41