0

I am having problems doing a prediction with decision trees (CART).

I have this code:

training <- read.csv("pml-training.csv", header=TRUE)
set.seed(1972)
inTrain <- createDataPartition(y=training2$classe, p=0.6, list=FALSE)
wk_training <- training2[inTrain,]
wk_testing <- training2[-inTrain,]

wk_trainng dataset has 11776 vars and wk_testing 7846.

set.seed(1972)
model_dt <- train(wk_training$classe ~ ., data = wk_training,  method="rpart")
print(model_dt, digits=3)

Run against wk_testing

predictions_dt <- predict(model_dt, newdata=wk_testing)

Then I expect predictions_dt to have 7846 rows as it has wk_testing, but predictions_dt has only 165 rows ????

I don't know what I am doing wrong...

Can anybody help me?

Thanks in advance

milos.ai
  • 3,882
  • 7
  • 31
  • 33
  • Where is `training2` variable defined in your code? Have you maybe made a typo in your code and use wrong variable? – giliev Aug 19 '15 at 22:00
  • I create training2 from training like this: nzv <- nearZeroVar(training, saveMetrics=TRUE) # eliminate near zero values cols <- nzv$nzv == FALSE training2 <- training[,cols] – Juan Vidal Aug 19 '15 at 22:28

1 Answers1

0

If you have missing values, the predict function defaults to na.action = na.omit. You can test to see if this is the issue using na.action = na.fail. If this is the case, you might want to impute. See the preProcess option in train.

topepo
  • 13,534
  • 3
  • 39
  • 52