0

So I have a problem using ctree in the R package party. I can't use the package partykit because it can't search for unordered splits in >= 31 levels

I used this code:

set.seed(1234) #To get reproducible result
ind <- sample(2,nrow(newnew_compressed_data), replace=TRUE, prob=c(0.7,0.3))
trainData <- newnew_compressed_data[ind==1,]
testData <- newnew_compressed_data[ind==2,]


myFormula <- MA ~ .
abundance_ctree <- party::ctree(myFormula, data=trainData)
abundance_ctree2 <- party::ctree(myFormula, data=testData)
print(abundance_ctree)
plot (abundance_ctree)
plot(abundance_ctree, type="simple")
plot (abundance_ctree2)

where MA is my y-variable and newnew_compressed_data is my dataset. The dataset has 1032 observations and 7 variables, which are being tested for importance.

This is what the tree currently looks like at the minute:

tree

You can see the labels are revealing every item in the category, which I'd rather print or put into a table! In addition, I'm not sure which each of the nodes correspond to, the output said I had 13 nodes...

Does anyone know of a way to reduce the levels and produce a better legend to explain what is represented in each of the nodes? I just can't interpret anything from this and struggling to find examples with big datasets.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • I would recommend to break down the levels of SC, assuming that it would be possible to represent these different species (?) by some underlying quantities. This would likely make the splits more interpretable as well. Similarly, I would check whether it is worth turning "not recorded" into a category vs. handling these by surrogate splits. Finally, the scale in the terminal boxplots suggests that either there is some huge effect somewhere or maybe some sort of log transformation or similar would be useful. – Achim Zeileis Nov 18 '19 at 01:46
  • Hi Achim, Thank you so much for your comments. So these are maybe 150 species, which occur in a wildlife market. I thought to potentially just do this by family? and maybe remove species, as this is too fine scale? can you handle the surrogate splits in ctree though? not recorded is for a lot of observations, for the price. I could just remove price due to the not recorded. Will look to see where the log transformation could be useful, – user241508 Nov 18 '19 at 09:39
  • Hi there- still having issues with this - anyone else know if partykit can now take unordered splits with more than 31 levels? the plot just looks horrible – user241508 Feb 19 '20 at 07:51

0 Answers0