I would say there is a bug in gbm
. If you look into the gbm.fit
function, they do a bunch of transformations for multinomial data before sending it off to the underlying "gbm" C function. These transformations are "undone" before the results are returned and they are not done again in the gbm.more
function.
One such transformation is to make sure that the first n
values in the data are associated with one of each of the n
factor levels of your y
variable. One work around it to make sure your data is in the format prior to calling gbm
in the first place. Here's how we would transform the iris data.
first.row <- tapply(1:nrow(miris), iris$Species, head,1)
miris <- rbind(miris[first.row,], miris[-first.row,])
and we see that the first three lines have a value for each of the different Species in the data
#head(miris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
101 6.3 3.3 6.0 2.5 virginica
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
You can then fit your data with
iris.mod=gbm(Species ~ ., distribution="multinomial", data=miris,
n.trees=200, shrinkage=0.01, verbose=FALSE, n.cores=1)
and then run
iris.mod1=gbm.more(iris.mod,100,verbose=FALSE)
without error.
I suggest you file a bug report with the package maintainer. This problem seems specific to the "multinomial" distribution. Feel free to include a link to this question.