3

Is there a standard (or available) way to export a gbm model in R? PMML would work, but when I I try to use the pmml library, perhaps incorrectly, I get an error:

For example, my code looks similar to this:

  library("gbm")
  library("pmml")

  model <- gbm(
      formula,
      data = my.data,
      distribution = "adaboost",
      n.trees = 450,
      n.minobsinnode = 10,
      interaction.depth = 4, shrinkage=0.05, verbose=TRUE)
  export <- pmml(model)
  # and then export to xml

And the error I get is:

Error in UseMethod("pmml") : no applicable method for 'pmml' applied to an object of class "gbm"

I've also tried passing in the dataset. In any case, I could live with another format I can parse programmatically (I'll be scoring on the JVM) but PMML would be great if there is a way to make that work.

Josh Marcus
  • 1,749
  • 18
  • 30
  • 1
    Both of the two I found on github dumped the GBM model in plain text and did some customized parsing afterwards. https://github.com/infnty/junkyard/blob/master/R/gbm-scorer.cc https://gist.github.com/shanebutler/5456942 – greeness Oct 13 '14 at 23:05
  • 1
    You can serialize R data structures using the `RProtoBuf` package. See the answer to your question at CV: http://stats.stackexchange.com/questions/118616/generating-pmml-export-of-a-gbm-model-in-r – user1808924 Oct 20 '14 at 09:56
  • Update: The above advice was good. I didn't find an out of the box solution, so I implemented a custom text export and then implemented the scoring based on that export in Scala. If I can, I'll open source the result and post that here. – Josh Marcus Oct 21 '14 at 18:11
  • @JoshMarcus did you end up open sourcing the results? Am very interested in exporting multi class gbms to pmml – Moderat Feb 26 '16 at 19:49
  • @Moderat Once I had the text representation, I built a custom gbm scorer in Scala instead of exporting to PMML. Sorry! – Josh Marcus Mar 03 '16 at 17:34

1 Answers1

3

You can do the job using the r2pmml package. Currently, it supports regression (ie. distribution = "gaussian") and binary classification (ie. distribution = "adaboost" or distribution = "bernoulli") model types.

Below is a sample code for the Auto MPG dataset:

library("gbm")
library("r2pmml")

auto = read.csv(file = "AutoNA.csv", header = TRUE)

auto.formula = gbm(mpg ~ ., data = auto, interaction.depth = 3, shrinkage = 0.1, n.trees = 100, response.name = "mpg")
print(auto.formula)

r2pmml(auto.formula, "/tmp/gbm.pmml")
user1808924
  • 4,563
  • 2
  • 17
  • 20