I am working with a dataset that has approximately 150000 rows and 25 columns. The data consist of numerical and factor variables. Factor variables are both text and numbers and I need all of them. The depended variable is a factor with 20 levels.
I am trying to build a model and feed it into a SVM using the kernlab
package in R.
library(kernlab)
n<- nrow(x)
trainInd<- sort(sample(1:nrow(x), n*.8))
xtrain<- x[trainInd,]
xtest<- x[-trainInd,]
ytrain<- y[trainInd]
ytest<- y[-trainInd]
modelclass<- ksvm(x=as.matrix(xtrain), y=as.matrix(ytrain),
scaled = TRUE, type="C-svc", kernel = "rbfdot",
kpar="automatic", C=1, cross=0)
Following the code, I get this error:
Error in if (any(co)) { : missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In FUN(newX[, i], ...) : NAs introduced by coercion
The xtrain
data frame looks like:
Length Gender Age Day Hour Duration Period
5 1 80 5 11 20 3
0.2 2 35 2 18 10 5
1.1 2 55 1 15 120 4
The Gender, Day, and Period variables are categorical (factors), where the rest is numerical.
I have gone through similar questions and been through my dataset as well, but I cannot identify any NA values or other mistakes.
I assume that I am doing something wrong with variable types, and particular the factors. I am unsure of how to use them, but I can't see something wrong. Any help of how to solve the error and possibly how to model factor together with numerical variables would be appreciated.