predict new data using support-vector regression R

Question

I am trying to implement leave-one-out cross-validation using Support-Vector Regression in R with the e1071 package. The data and the code I have look more or less like this:

library(e1071) 

#create fake dataset

y=rpois(30,3)-4+(rbinom(30,1,0.5))/2
x1=c(rep('C',16),rep('S',14))
x2=c(runif(16,0,1),runif(14,0,1)/10)
x3=c(runif(16,0,1)/5,runif(14,0,1))
dat=data.frame(y=y,x1=x1,x2=x2,x3=x3)
train=dat[-1,]
test=dat[1,]

# train the model

model=tune(svm, train$y ~ train$x1*train$x2*train$x3,kernel='linear',
ranges = list(epsilon = seq(0.1,0.6,0.1), cost = 2^(0:9)))
model=model$best.model

#predict

predict(model,newdata=test)

The problem is that the predict function returns only the trained values and does not predict the test dataset. I have seen a similar question here, predict.svm does not predict new data, but it seems the solution does not apply to my code. Any ideas on this problem?

score 0 · Accepted Answer · answered Apr 26 '16 at 21:06

0

Anytime you are using a $ inside a formula (~), that's a sign that things are likely to get messed up. Here's how you should re-write your tune() call

model=tune(svm, y ~ x1*x2*x3, data=train, 
    kernel='linear', ranges = list(epsilon = seq(0.1,0.6,0.1), cost = 2^(0:9)))

This detaches the variables from the train data.frame specifically and allows you to predict into new data sets with the same variable names.

answered Apr 26 '16 at 21:06

MrFlick

195,160
17
277
295

Actually I was calling the function in this way before, but for some reason I was getting an error message. Now it worked. Thank you very much for the help!! – tfigueiredo Apr 26 '16 at 21:13

score 0 · Answer 2 · answered Apr 26 '16 at 21:06

0

A few things here - I don't know if you want a triple interaction between x1, x2 and x3 or if you want them as independent variables. Below runs them as independent variables. The most important thing however, is that you call the data in your model formula which is why you were always predicting your train dataset (train$x1)

model=tune(svm, y ~ x1+x2+x3,kernel='linear',data=train,
           ranges = list(epsilon = seq(0.1,0.6,0.1), cost = 2^(0:9)))
model=model$best.model

#predict

predict(model,newdata=test)

answered Apr 26 '16 at 21:06

Jason

1,559
1
9
14

Yes, I expect a possible interaction between these three variables. I rewrote the tune call in this form and it worked! Thank you! – tfigueiredo Apr 26 '16 at 21:14
Any non-linear algorithm will pick up interactions. You shouldn't need to do any multiplication and it will just muddy your equation. – Jason Apr 27 '16 at 11:56

predict new data using support-vector regression R

2 Answers2