3

I am trying to use the gbm.more function in R. For the purpose of clarity I have used the canonical iris data. When I specify distribution="multinomial" the code below doesn't work, but when I specify distribution="gaussian", the code works. Is there a reason for this or is it just a problem with the function?

data(iris)
iris.mod=gbm(Species ~ ., distribution="multinomial", data=iris,
            n.trees=200, shrinkage=0.01, verbose=FALSE, n.cores=1)
iris.mod1=gbm.more(iris.mod,100,verbose=FALSE)
  • specifically, the following error is output: Error in gbm.more(iris.mod, 100, verbose = FALSE) : Observations are not in order. gbm() was unable to build an index for the design matrix. Could be a bug in gbm or an unusual data type in data. – user3742790 Jun 15 '14 at 20:36
  • Is my question too specific or have I posted it in the wrong place? – user3742790 Jun 15 '14 at 23:28

1 Answers1

1

I would say there is a bug in gbm. If you look into the gbm.fit function, they do a bunch of transformations for multinomial data before sending it off to the underlying "gbm" C function. These transformations are "undone" before the results are returned and they are not done again in the gbm.more function.

One such transformation is to make sure that the first n values in the data are associated with one of each of the n factor levels of your y variable. One work around it to make sure your data is in the format prior to calling gbm in the first place. Here's how we would transform the iris data.

first.row <- tapply(1:nrow(miris), iris$Species, head,1)
miris <- rbind(miris[first.row,], miris[-first.row,])

and we see that the first three lines have a value for each of the different Species in the data

#head(miris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
51           7.0         3.2          4.7         1.4 versicolor
101          6.3         3.3          6.0         2.5  virginica
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa

You can then fit your data with

iris.mod=gbm(Species ~ ., distribution="multinomial", data=miris,
    n.trees=200, shrinkage=0.01, verbose=FALSE, n.cores=1)

and then run

iris.mod1=gbm.more(iris.mod,100,verbose=FALSE)

without error.

I suggest you file a bug report with the package maintainer. This problem seems specific to the "multinomial" distribution. Feel free to include a link to this question.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Wonderful that seems to have fixed my issue!! thanks so much! I have emailed the package maintainer about the possible bug. – user3742790 Jun 16 '14 at 14:30
  • hmmm actually this doesn't seem to have completely fixed the problem. If I run iris.mod=gbm() and check the number of trees, I correctly get iris.mod$n.trees=200. However, after I use gbm.more() iris.mode$n.trees =900 as opposed to 300. – user3742790 Jun 16 '14 at 19:43