2

While trying to export an R classifier to PMML, using the pmml package, I noticed that the class distribution for a node in the tree is not exported.

PMML supports this with the ScoreDistribution element: http://www.dmg.org/v1-1/treemodel.html

Is there anyway to have this information in the PMML? I want to read the PMML with another tool that depends on this information.

I'm doing something like:

library(randomForest)
library(pmml)

iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,proximity=TRUE)
pmml(iris.rf)
halfwarp
  • 1,780
  • 5
  • 24
  • 41

1 Answers1

4

Can you provide some more information..such as, which function you are trying to use.

For example, if you are using the randomForest package, I believe it doesn't provide information about the score distribution; so neither can the PMML representation. However, if you are using the default values, the parameter 'nodesize' for classification ceses, for example, equals 1 and that means the terminal node will have a ScoreDistribution such as:

ScoreDistribution value=predictedValue probability="1.0"/>

ScoreDistribution value=AnyOtherTargetCategoty probability="0.0"/>

If you are using the rpart tree model, the pmml function does output the score distribution information. Perhaps you can give us the exact commands you used?

Tridi
  • 186
  • 2
  • I am indeed using the randomForest package. I looked at the Weka's RandomForest sources, which do provide the score distribution. Why doesn't R's does the same? I've edited my question with the example code I'm using. – halfwarp Feb 25 '14 at 10:57
  • 1
    So as I said, the reason is in the R 'randomForest' package, not 'pmml'. I cannot say why the authors of that package chose not to output this information, but if I had to guess, it is because that information may not be necessary? Usually, ScoreDistribution is used to calculate the probabilities of the prediction... randomForest, I believe, does that by simply counting the number of votes. – Tridi Feb 25 '14 at 18:45