kernlab regression

Question

Anyone encountered this difficulty with kernlab regression? It seems like it's losing some scaling factors or something, but perhaps I'm calling it wrong.

library(kernlab)
df <- data.frame(x=seq(0,10,length.out=1000))
df$y <- 3*df$x + runif(1000) - 3
plot(df)
res <- ksvm(y ~ x, data=df, kernel='vanilladot')
lines(df$x, predict(res), col='blue', lwd=2)

svm-results

With this toy example I can get reasonable results if I explicitly pass newdata=df, but with my real data I've found no such workaround. Any insight?

John Colby · Answer 1 · 2011-12-13T18:30:03.320

1

Passing a newdata argument is the correct way to do it (or else it will use the internally-scaled data, like you saw). The typical way is something like:

newx = seq(min(df$x), max(df$x), len=100)
lines(newx, predict(res, newdata=data.frame(x=newx)), col='blue', lwd=2)

If this still doesn't work on your real data, please elaborate...

For what it's worth, I usually prefer to manually scale my data first, and then set scaled=F. That way you don't have to worry about this type of thing that can crop up at different times.

EDIT: I should also add that when you make the newdata data frame, the variable names should match what you used to create the model, and not necessarily be "x".

edited Dec 13 '11 at 18:30

answered Dec 13 '11 at 17:52

John Colby

22,169
4
57
69

That seems like a bug, right? Scaling shouldn't depend on whether `newdata` is implicit or explicit. – Ken Williams Dec 13 '11 at 18:33
In my real task, I'm doing cross-validation, so I pass `newdata=` to get the `predict()` result. – Ken Williams Dec 13 '11 at 18:38
Not sure. Definitely possible, but I imagine someone could also argue good reasons for plotting what is actually in the model by default. – John Colby Dec 13 '11 at 19:11
1

Just confirmed in private email with Alexandros that it's a bug, which they'll fix in the next release. Thanks for the help. – Ken Williams Dec 14 '11 at 14:41

kernlab regression

1 Answers1