0

Let's consider the dataset is of a bank(to predict loan) which contains the following attributes.

> names(univ2)
[1] "age" "inc" "family" "edu" "mortgage" "ccavg" "cc" "cd"  "online" "securities" "infoReq" "loan"

I have converted almost all attributes to factors and the rest are converted using discretize function i.e age,inc,ccavg and mortgage. Then converting those variables to factors in order to pass it to Decision trees Algorithm

age <- discretize(univ2$age, disc="equalfreq", nbins=10) 
age=as.factor(age$X)

Similarly for Inc, CCavg and Mortgage. Lets consider the bin value in discretize ranges from 5-12 i.e total 8 bin values for each attribute and the possible arrangements might be 8P4 = 1680. I can pass the TRAIN, TEST, EVALUATION Data each time to DTrees and get Predictions with Accuracy in the following way.

dtC50 <- C5.0(loan ~ ., data = train, rules=TRUE)
a=table(train$loan, predict(dtC50, 
                        newdata=train, type="class"))
rcTrain=(a[2,2])/(a[2,1]+a[2,2])*100

Similarly for test, eval to create rcTest and rcTrain .Let the accuracy be

Recall in Training 91.26027 
Recall in Testing 94.11765 
Recall in Evaluation 93.37209

The question here, Is there any way that I can use functions(or other way) to model the Train data and predict Train,Test,Eval data using the above bin arrangements of 8P4 and store the output in a dataframe consisting of 6 Attributes of

1 ID                 : 1:1680
2 Bin Arrangement on (Age,Inc,CCavg,Mortgae) : (5,5,5,5)...........(10,11,12,5)
3 TrainAccuracy      : %'s
4 TestAccuracy       : %'s
5 EvaluationAccuracy : %'s
6 Is Test>Train      : 0 if does not satisfies, 1 if satisfies

Please correct me if I'm wrong in the arrangements and other mistakes.

Any method to solve this problem?

lmo
  • 37,904
  • 9
  • 56
  • 69
Nikhil Kumar
  • 455
  • 1
  • 4
  • 11
  • you can use simple for loop for doing this and keep of binding the results of one loop run into say `result` data frame. – abhiieor Feb 25 '17 at 08:55
  • As mentioned by @abhiieor, you can use the loop to save each run in a data frame, and all the data frame can be added to a list. By the way, it would be better to leave the numeric variable as it is, hence your tree method can decide a best split. – Sixiang.Hu Feb 25 '17 at 13:04

0 Answers0