0

I am trying to export a model built by c50 package in R.

I'm using the partykit package for extract the last trial, but it doesn't return the same fitting value.

I'dont understand why the as.party.c5.0 function doesn't fit exactly the same way as the C5.0 function does. it works for the first trial but not for the other ones.

For example :

poc_db<-iris
fullTree_prun_iris_Winow <- C5.0(Species ~ ., data =poc_db, trials = 10,control = C5.0Control(CF = 0.90,noGlobalPruning = FALSE,winnow = T))

cat(fullTree_prun_iris_Winow$output)
-----  Trial 9:  -----
Decision tree:
Petal.Width <= 0.6: setosa (10.5) 
Petal.Width > 0.6:
:...Petal.Width <= 1.7: versicolor (116.3/49.4)
    Petal.Width > 1.7: virginica (22.2)

modParty <- C50:::as.party.C5.0(fullTree_prun_iris_Winow,trial=10)
Fitted party:
[1] root
|   [2] Petal.Width <= 0.6: setosa (n = 50, err = 0.0%)
|   [3] Petal.Width > 0.6
|   |   [4] Petal.Width <= 1.7: versicolor (n = 54, err = 9.3%)
|   |   [5] Petal.Width > 1.7: virginica (n = 46, err = 2.2%)

We should have for the 4th node : ... versicolor(116/49)

Thanks for help

Romain R
  • 47
  • 8

1 Answers1

0

The fourth node has 54 observations, and 49 of these are versicolor. See

table(subset(poc_db, Petal.Width > 0.6 & Petal.Width <= 1.7)$Species)
##     setosa versicolor  virginica 
##          0         49          5 

Hence, partykit reports n = 54 and err = 9.3% corresponding to 5/54. The value reported by C5.0 is different because it comes from boosting the trees over several trials rather than just using a single tree by itself.

Achim Zeileis
  • 15,710
  • 1
  • 39
  • 49
  • Suppose I ran a C5 algorithm with 10 trials, giving 10 rule set with different error rate then I have the final boost error rate which is much lower than the other 10 error rates. I can extract the rule set for all the 10 trials separately, How to get the rules for the boost error rate? – Ezio Nov 22 '17 at 15:45
  • Boosting with trees does not yield a single tree as the resulting model. – Achim Zeileis Nov 22 '17 at 19:12