I am reviewing my e1071
code for SVM for the Kaggle Titanic data. Last I knew, this part of it was working, but now I'm getting a rather strange error. When I try to build my data.frame so I can submit to kaggle, it seems my prediction is the size of my training set instead of the test set.
Problem
Error in data.frame(PassengerId = test$passengerid, Survived = prediction) : arguments imply differing number of rows: 418, 714
Obviously, they should both be 418 and I do not understand what is going wrong?
Details
Here is my script:
setwd("Path\\To\Data")
train <- read.csv("train.csv")
test <- read.csv("test.csv")
library("e1071")
bestModel = svm(Survived ~ Pclass + Sex + Age + Sex * Pclass, data = train, kernel = "linear", cost = 1)
prediction <- predict(bestModel, newData=test, type="response")
prediction[prediction >= 0.5] <- 1
prediction[prediction != 1] <- 0
prediction[is.na(prediction)] <- 0
This is the line that gives me the error:
predictionSubmit <- data.frame(PassengerId = test$passengerid, Survived = prediction)
Attempts
I have used names(train)
and names(test)
to verify my column variable names are the same. You can find the data here. I know my prediction code can be optimized into one line, but that isn't the issue here. I would appreciate a second pair of eyes on this issue. I am thinking about using the kernlab
library, but was wondering if there was a syntatical sugar issue I was neglecting here. Thanks for your suggestions and clues.