I have an unbalanced train data set and now I want to put some weight on my minority class ("bad") which is to be predicted and then put the weight into the rpart
commado:
My data frame looks sth like this:
> head(train)
case V1 V2 V3 V4
1 bad a LL AUT 1
2 good b LL AUT 3
3 good b LL AUT 2
4 good b LL MAN 1
5 good c RL AUT 2
6 good b LL AUT 3
Now put weight on my "bad" cases:
caseweights <- train$case[train$case == "bad"]
> tree <- rpart(train$case ~ ., train,
+ method = "class",
+ minsplit =1, minbucket=1, maxdepth=3,
+ parms = list(split = "gini"),
+ cp=-1, weight = caseweights)
But it gives me this error:
Error in model.frame.default(formula = train$case ~ ., data = train, : Variablenlängen sind unterschiedlich (gefunden für '(weights)')
It's german and basically saying that the lengths of the variables are different ( found for '(weights)'....
So I go have a look how long my data sets are:
> nrow(train)
[1] 11525
> nrow(caseweights)
NULL # <---------- Why NULL?
When I have a look at caseweigths, I can see a vector with ~ 420 entries of "bad"... Where am I thinking wrong?