I have a dataset and have split it into train (80%) and test (20%) set. First step is setting up decision tree and then I predict using my test set.
tree <- rpart(train$number ~ ., train, method = "class")
pred <- predict(tree,test, type ="class")
After running this, I get an error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, : Faktor 'orderland' hat neue Stufen Zypern
Which basically means, I have the land "Zypern" in my test set, but not in my train set. To deal with this problem I googled and tried this out by setting the factor levels equal.
train$orderland <- factor(train$orderland, levels=levels(test$orderland))
Summary of test and train data:
> summary(train)
number orderland lenkung transmission IntervalRange
Length:54616 NA's:54616 Length:54616 Length:54616 1: 7893
Class :character Class :character Class :character 2:39528
Mode :character Mode :character Mode :character 3: 7195
> summary(test)
number orderland lenkung transmission IntervalRange
Length:13655 Length:13655 Length:13655 Length:13655 1:1959
Class :character Class :character Class :character Class :character 2:9904
Mode :character Mode :character Mode :character Mode :character 3:1792
But I get the same error...any ideas why?