3

I am looking at the R function gausspr from the kernlab package for Gaussian process regression. The process is defined by the hyperparameters of the kernel function and by the noise in the data. I see in the documentation that I can specify

var: the initial noise variance, (only for regression) (default : 0.001)

but I do not see how to access the estimated value after the regression has run. For instance, consider I have some observed points, and want to predict y values at the locations given by X:

obs <- data.frame(x = c(-4, -3, -1,  0,  2),
                  y = c(-2,  0,  1,  2, -1))
X <- seq(-5,5,len=50)

I can do so with kernlab::gausspr as such:

gp <- gausspr(obs$x, obs$y, kernel="rbfdot", scaled=FALSE, var=.09)
Ef <- predict(gp, X)

I can get the estimated value of the kernel hyperparameter:

gp@kernelf@kpar

But I don't see how I can return the estimated value of the noise parameter, var?

cboettig
  • 12,377
  • 13
  • 70
  • 113

1 Answers1

3

I might be overlooking something, but I don't think that the initial noise variance var is "fit" to anything; I don't think it is a parameter (although I agree that using the word "initial" makes you think otherwise).

The noise variance is just added to the diagonal of the correlation matrix of the training points, as described on this page about some other software. Looking through the function definition, it looks like this is exactly what it is doing in kernlab as well:

# The only relevant line where 'var' is used
alpha(ret) <- solve(K + diag(rep(var, length = m))) %*% y

If you wanted to get the error (or any measure of fit) by the noise variance, you could do something like:

error.fun<-function(x) error(gausspr(obs$x, obs$y, kernel="rbfdot", scaled=FALSE, var=x))
noises<-seq(0.1,1,by=0.1)
y<-sapply(noises,error.fun)
plot(noises,y,type='l')

The built-in cross-validation does not "fit" var in any way, from what I can tell. The only relevant line in the cross validation is here:

cret <- gausspr(x[cind, ], y[cind], type = type(ret), 
                scaled = FALSE, kernel = kernel, var = var, 
                tol = tol, cross = 0, fit = FALSE)

And you can see that var is just put in with no changes.

nograpes
  • 18,623
  • 1
  • 44
  • 67
  • Yes, I have been wondering about that. In principle one would estimate this, as it does the length scale, either in cross-validation method or by likelihood (as described here: http://www.gaussianprocess.org/gpml/chapters/RW5.pdf, see, for instance, figure 5.9) – cboettig Nov 29 '12 at 01:13
  • Yes, you could certainly examine the fit under different values of `var`. This function does not do that for you. – nograpes Nov 29 '12 at 01:18
  • Yeah, I just want(ed) to be sure. I see that the only mention of `var` in that file, but it has a cross-validation method built in, it does estimate the other kernel parameter (length scale), and it calls subroutines not in that file. If it'll do cross-validation and estimate the length scale, can I be sure it's not estimating var through a subroutine? – cboettig Nov 29 '12 at 01:54
  • I updated my answer explaining how `var` remains constant over all folds of the cross-validation. – nograpes Nov 30 '12 at 16:24