0

Firstly, I constructed a model by

cf1 <- cforest(y~., data = DATA, strata = DATA$y,
           ntree = 200L, mtry = 10)

Here considering the dataset is very imbalanced (y=1 takes 7% of the whole observations), so I add strata here to make sure observations with y=1 are not ignored in bagging. cf1 works normally, in terms of the confusion matrix. However, when I tried to implement feature selection by

cf1.imp_cond <- varimp(cf1, conditional = TRUE)

It returns

Error in x[strata == s] <- .resample(x[strata == s]) : 
NAs are not allowed in subscripted assignments

I can't figure out what does this error mean. Someone met this before?

----update

Here is an manipulated test data from the original dataset I am using. Here is the code

cf2 <- cforest(X5_years_survival~., data = test, strata = X5_years_survival,
           ntree = 200L, mtry = 6)
cf2.imp_cond <- varimp(cf2, conditional = TRUE)

Still, I have the error:

Error in x[strata == s] <- .resample(x[strata == s]) : 
NAs are not allowed in subscripted assignments

---update

The error occurs when kidids_node function is applied.

Bs He
  • 717
  • 1
  • 10
  • 22
  • Can you post a minimal reproducible example? Otherwise it's hard to debug. – Achim Zeileis May 24 '18 at 10:53
  • @AchimZeileis I am trying to do so. By the way, do you think my way in setting `strata` is the way it supposed to be? – Bs He May 24 '18 at 14:36
  • @AchimZeileis example added. – Bs He May 24 '18 at 15:48
  • Thanks. However, this runs successfully for me after doing `test <- read.csv("test.csv")`. It does not seem to matter whether `X5_years_survival` is turned into a `factor` (as requested by `cforest`) or not. When working on an improved example: try to install the latest version of the package and boil down the example even more (e.g., one covariate and one strata) to reliably and quickly produce an error. – Achim Zeileis May 24 '18 at 17:15
  • @AchimZeileis Do you mean you didn't meet the same error? I installed 1.2.1 version of `partykit`. – Bs He May 24 '18 at 17:20
  • Yes. For me everything runs without error. – Achim Zeileis May 24 '18 at 17:32
  • It is as you described. But then `X5_years_survival` is treated as an integer variable, and so `cforest` is a regression model, instead of a classfier, right? So that means, for classfication proble, we need to construct it as a regression model first and then we can use `varimp` to calculate the conditional variable importance, right? – Bs He May 24 '18 at 20:06
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171725/discussion-between-bs-he-and-achim-zeileis). – Bs He May 24 '18 at 20:13

1 Answers1

0

The truth is, if I keep all integer type covariate, instead of converting them by as.factor, applying varimp makes no error.

Bs He
  • 717
  • 1
  • 10
  • 22