Let's consider the dataset is of a bank(to predict loan) which contains the following attributes.
> names(univ2)
[1] "age" "inc" "family" "edu" "mortgage" "ccavg" "cc" "cd" "online" "securities" "infoReq" "loan"
I have converted almost all attributes to factors and the rest are converted using discretize function i.e age,inc,ccavg and mortgage. Then converting those variables to factors in order to pass it to Decision trees Algorithm
age <- discretize(univ2$age, disc="equalfreq", nbins=10)
age=as.factor(age$X)
Similarly for Inc, CCavg and Mortgage. Lets consider the bin value in discretize ranges from 5-12 i.e total 8 bin values for each attribute and the possible arrangements might be 8P4 = 1680. I can pass the TRAIN, TEST, EVALUATION Data each time to DTrees and get Predictions with Accuracy in the following way.
dtC50 <- C5.0(loan ~ ., data = train, rules=TRUE)
a=table(train$loan, predict(dtC50,
newdata=train, type="class"))
rcTrain=(a[2,2])/(a[2,1]+a[2,2])*100
Similarly for test, eval to create rcTest and rcTrain .Let the accuracy be
Recall in Training 91.26027
Recall in Testing 94.11765
Recall in Evaluation 93.37209
The question here, Is there any way that I can use functions(or other way) to model the Train data and predict Train,Test,Eval data using the above bin arrangements of 8P4 and store the output in a dataframe consisting of 6 Attributes of
1 ID : 1:1680
2 Bin Arrangement on (Age,Inc,CCavg,Mortgae) : (5,5,5,5)...........(10,11,12,5)
3 TrainAccuracy : %'s
4 TestAccuracy : %'s
5 EvaluationAccuracy : %'s
6 Is Test>Train : 0 if does not satisfies, 1 if satisfies
Please correct me if I'm wrong in the arrangements and other mistakes.
Any method to solve this problem?