I am a SAS user and currently studying how to make decision tree using R-package.
I do have a good finding associated with each nodes, but now I'm facing 3 questions:
Can I start with a specific variable (top-to-bottom), say, categorical var like gender? ( I did it in FICO-Model builder but now I dont have it anymore)
I have a binary var(gender:1-Male/0-Female), but the nodes split at 0.5?(I tried change it to factor, but didn't work? Also I have a var "AGE", should I change the type to "xxx" instead of "numeric"?)
Based on cp value (below table), I set 0.0128 to prune the tree, but only two vars left, can I choose to keep specific vars?( I do play with the numbers of cp, but the result is not changing )
#tree
library(rpart)
library(party)
library(rpart.plot)
#1
minsplit<-60
ct <- rpart.control(xval=10, minsplit=minsplit,minbucket =
minsplit/3,cp=0.01)
iris_tree <- rpart(Overday_E60dlq ~ .
,
data= x, method="class",
parms = list(prior = c(0.65,0.35), split = "information")
,control=ct)
#plot split.
plot_tris<-rpart.plot(iris_tree, branch=1 , branch.type= 1, type= 2, extra=
103,
shadow.col="gray", box.col="green",
border.col="blue", split.col="red",
cex=0.65, main="Kyphosis-tree")
plot_tris
#summary
summary(iris_tree)
#===========prune process=========
printcp(iris_tree)
## min-xerror cp:
fitcp<-prune(iris_tree, cp=
iris_tree$cptable[which.min(iris_tree$cptable[,"xerror"]),"CP"])
#cp table
fit2<-prune(fitcp,cp= 0.0128 )
#plot fit2
rpart.plot(fit2, branch=1 , branch.type= 1, type= 2, extra= 103,
shadow.col="gray", box.col="green",
border.col="blue", split.col="red",
cex=0.65, main="Kyphosis fit2")