0

In R, I am trying to use the bag function with the train function. I start with using train and rpart for a classification tree model, on the simple iris data set. Now I want to create a bag of such 10 trees with the bag function. The documentation says that the aggregate parameter must be a function to choose a value from all bagged models, so I created one called agg, which chooses the string of greatest frequency. However, the bag function gives the following error:

Error in fitter(btSamples[[iter]], x = x, y = y, ctrl = bagControl, v = vars,  : 
  task 1 failed - "attempt to apply non-function"

Here is my complete code:

# Use bagging to create a bagged classification tree from 10 classification trees created with rpart.
data(iris)

# Create training and testing data sets:
inTrain = createDataPartition(y=iris$Species, p=0.7, list=F)
train = iris[inTrain,]
test = iris[-inTrain,]

# Create regressor and outcome datasets for bag function:
regressors = train[,-5]
species = train[,5]

# Create aggregate function:
agg = function(x, type) {
  y = count(x)
  y = y[order(y$freq, decreasing=T),]
  as.character(y$x[1])
}

# Create bagged trees with bag function:
treebag = bag(regressors, species, B=10,
              bagControl = bagControl(fit = train(data=train, species ~ ., method="rpart"),
                                      predict = predict,
                                      aggregate = agg
                                      )
              )

This gives the error message stated above. I don't understand why it rejects the agg function.

1 Answers1

0

from ?bag()

When using bag with train, classification models should use type = "prob" inside of the predict function so that predict.train(object, newdata, type = "prob") will work.

So I guess you might want to try:

bagControl = bagControl(fit = train(data=train, species ~ .,
                                    method="rpart", type="prob"),
                        predict = predict,
                        aggregate = agg
                        )
HubertL
  • 19,246
  • 3
  • 32
  • 51
  • I added type="prob" and that solved the error message I got before. However, now this line of code crashes with a new error message: `Error in train.default(x, y, weights = w, ...) : Stopping` – Jerome Smith Jan 30 '16 at 15:50