2

Anyone encountered this difficulty with kernlab regression? It seems like it's losing some scaling factors or something, but perhaps I'm calling it wrong.

library(kernlab)
df <- data.frame(x=seq(0,10,length.out=1000))
df$y <- 3*df$x + runif(1000) - 3
plot(df)
res <- ksvm(y ~ x, data=df, kernel='vanilladot')
lines(df$x, predict(res), col='blue', lwd=2)

svm-results

With this toy example I can get reasonable results if I explicitly pass newdata=df, but with my real data I've found no such workaround. Any insight?

Ken Williams
  • 22,756
  • 10
  • 85
  • 147

1 Answers1

1

Passing a newdata argument is the correct way to do it (or else it will use the internally-scaled data, like you saw). The typical way is something like:

newx = seq(min(df$x), max(df$x), len=100)
lines(newx, predict(res, newdata=data.frame(x=newx)), col='blue', lwd=2)

If this still doesn't work on your real data, please elaborate...

For what it's worth, I usually prefer to manually scale my data first, and then set scaled=F. That way you don't have to worry about this type of thing that can crop up at different times.

EDIT: I should also add that when you make the newdata data frame, the variable names should match what you used to create the model, and not necessarily be "x".

John Colby
  • 22,169
  • 4
  • 57
  • 69
  • That seems like a bug, right? Scaling shouldn't depend on whether `newdata` is implicit or explicit. – Ken Williams Dec 13 '11 at 18:33
  • In my real task, I'm doing cross-validation, so I pass `newdata=` to get the `predict()` result. – Ken Williams Dec 13 '11 at 18:38
  • Not sure. Definitely possible, but I imagine someone could also argue good reasons for plotting what is actually in the model by default. – John Colby Dec 13 '11 at 19:11
  • 1
    Just confirmed in private email with Alexandros that it's a bug, which they'll fix in the next release. Thanks for the help. – Ken Williams Dec 14 '11 at 14:41