Problems with prediction in decision tree in caret package

Question

I am having problems doing a prediction with decision trees (CART).

I have this code:

training <- read.csv("pml-training.csv", header=TRUE)
set.seed(1972)
inTrain <- createDataPartition(y=training2$classe, p=0.6, list=FALSE)
wk_training <- training2[inTrain,]
wk_testing <- training2[-inTrain,]

wk_trainng dataset has 11776 vars and wk_testing 7846.

set.seed(1972)
model_dt <- train(wk_training$classe ~ ., data = wk_training,  method="rpart")
print(model_dt, digits=3)

Run against wk_testing

predictions_dt <- predict(model_dt, newdata=wk_testing)

Then I expect predictions_dt to have 7846 rows as it has wk_testing, but predictions_dt has only 165 rows ????

I don't know what I am doing wrong...

Can anybody help me?

Thanks in advance

Where is `training2` variable defined in your code? Have you maybe made a typo in your code and use wrong variable? — giliev, Aug 19 '15 at 22:00
I create training2 from training like this: nzv <- nearZeroVar(training, saveMetrics=TRUE) # eliminate near zero values cols <- nzv$nzv == FALSE training2 <- training[,cols] — Juan Vidal, Aug 19 '15 at 22:28

score 0 · Answer 1 · answered Aug 21 '15 at 15:17

0

If you have missing values, the predict function defaults to na.action = na.omit. You can test to see if this is the issue using na.action = na.fail. If this is the case, you might want to impute. See the preProcess option in train.

answered Aug 21 '15 at 15:17

topepo

13,534
3
39
52

Problems with prediction in decision tree in caret package

1 Answers1