So I have a problem using ctree in the R package party. I can't use the package partykit because it can't search for unordered splits in >= 31 levels
I used this code:
set.seed(1234) #To get reproducible result
ind <- sample(2,nrow(newnew_compressed_data), replace=TRUE, prob=c(0.7,0.3))
trainData <- newnew_compressed_data[ind==1,]
testData <- newnew_compressed_data[ind==2,]
myFormula <- MA ~ .
abundance_ctree <- party::ctree(myFormula, data=trainData)
abundance_ctree2 <- party::ctree(myFormula, data=testData)
print(abundance_ctree)
plot (abundance_ctree)
plot(abundance_ctree, type="simple")
plot (abundance_ctree2)
where MA
is my y-variable and newnew_compressed_data
is my dataset. The dataset has 1032 observations and 7 variables, which are being tested for importance.
This is what the tree currently looks like at the minute:
You can see the labels are revealing every item in the category, which I'd rather print or put into a table! In addition, I'm not sure which each of the nodes correspond to, the output said I had 13 nodes...
Does anyone know of a way to reduce the levels and produce a better legend to explain what is represented in each of the nodes? I just can't interpret anything from this and struggling to find examples with big datasets.