4

I have a strange issue, whenever I try increasing the mfinal argument in boosting function of adabag package beyond 10 I get an error, Even with mfinal=9 I get warnings.

My train data has 7 class Dependant variable and 100 independant variables and around 22000 samples of data(Smoted one class using DMwR). My Dependant Variable is at the end of the training dataset in sequence.

library(adabag)
gc()
exp_recog_boo <- boosting(V1 ~ .,data=train_dataS,boos=TRUE,mfinal=9)

Error in 1:nrow(object$splits) : argument of length 0
In addition: Warning messages:
1: In acum + acum1 :
longer object length is not a multiple of shorter object length

Thanks in advance.

Abdul Khader
  • 141
  • 1
  • 2
  • 5

6 Answers6

7

My mistake was that I didn't set the TARGET as factor before.

Try this:

train$target <- as.factor(train$target)

and check by doing:

str(train$TARGET)
thepule
  • 1,721
  • 1
  • 12
  • 22
  • Even as a Boolean vector this problem also occurs. This solved my end too. – Oeufcoque Penteano Oct 18 '17 at 02:48
  • The suggestion to check for factor status of train$target by executing `str(train$TARGET)` is wrongheaded. R is case sensitive so the two vectors would not be the same. – IRTFM Jan 01 '18 at 22:11
2

This worked for me:

modelADA <- boosting(lettr ~ ., data = trainAll, boos = TRUE, mfinal = 10, control = (minsplit = 0))

Essentially I just told rpart to require a minimum split length of zero to generate tree, it eliminated the error. I haven't tested this extensively so I can't guarantee it's a valid solution (what does a tree with a zero length leaf actually mean?), but it does prevent the error from being thrown.

TomR
  • 546
  • 8
  • 19
1

I think i Hit the problem.

ignore this -if you configure your control with a cp = 0, this wont happen. I think that if the first node of a tree make no improvement (or at least no better than the cp) the tree stay wiht 0 nodes so you have an empty tree and that make the algorithm fail.

EDIT: The problem is that the rpart generates trees with only one leaf(node) and the boosting metod use this sentence "k <- varImp(arboles[[m]], surrogates = FALSE, competes = FALSE)" being arboles[[m]] a tree with only one node it give you the eror.

To solve that you can modify the boosting metod:

Write: fix(boosting) and add the *'S lines.

if (boos == TRUE) { 
**   k <- 1
**   while (k == 1){
     boostrap <- sample(1:n, replace = TRUE, prob = pesos)
     fit <- rpart(formula, data = data[boostrap, -1],
         control = control)
**   k <- length(fit$frame$var)
**   }
     flearn <- predict(fit, newdata = data[, -1], type = "class")
     ind <- as.numeric(vardep != flearn)
     err <- sum(pesos * ind)
 }

this will prevent the algorith from acepting one leaf trees but you have to set the CP from the control param as 0 to avoid an endless loop..

1

Just ran into the same problem, and setting the complexity parameter to -1 or minimum split to 0 both work for me with rpart.control, e.g.

library(adabag)

r1 <- boosting(Y ~ ., data = data, boos = TRUE, 
               mfinal = 10,  control = rpart.control(cp = -1))

r2 <- boosting(Y ~ ., data = data, boos = TRUE, 
               mfinal = 10,  control = rpart.control(minsplit = 0))
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
0

I also run into this same problem recently and this example R script solves it completely!

The main idea is that you need to set the control for rpart (which adabag uses for creating trees, see rpart.control) appropriately, so that at least a split is attempted in every tree.

I'm not totally sure but it appears that your "argument of length 0" may be the result of an empty tree, which can happen since there is a default setting of a "complexity" parameter that tells the function not to attempt a split if the decrease in homogeneity/lack of fit is below certain threshold.

David
  • 143
  • 9
  • It has been more than a year since I posted that link. Thanks for bringing it to my attention. The main point there is the warning is generated because some trees are empty, so making cp= -1 (in rpart.control) will force rpart to split until maxdepth, avoiding empty trees. – David Mar 22 '15 at 07:50
0

use str() to see the attributes of your dataframe. For me, I just convert myclass variable as factor, then everything runs.

JTD
  • 1