How to extract the Prediction Intervals of a Gaussian Process Regression via caret kernlab package?

Question

I am trying to use a Gaussian Process Regression (GPR) model to predict hourly streamflow discharges in a river. I've got good results applying the caret::kernlab train () function (thanks Kuhn!).

Since the uncertainty idea is one of the main inherent ones advantages of the GPR, I would like to know if anyone could help me to access the results related to the prediction inteval of the test dataset.

I'll put an extract of the code I've been working. Since my real data are huge (and sincerely, I don't know how to put it here), I'll example with the data(airquality). The main goal in this particular example is to predict airquality$Ozone, using as predictos the lag-variables of airquality$Temperature.

rm(list = ls())
data(airquality)

airquality = na.omit(as.data.frame(airquality)); str(airquality)

library(tidyverse)
library(magrittr)

airquality$Ozone %>% plot(type = 'l')
lines(airquality$Temp, col = 2)
legend("topleft", legend = c("Ozone", "Temperature"),
   col=c(1, 2), lty = 1:1, cex = 0.7, text.font = 4, inset = 0.01, 
   box.lty=0, lwd = 1)

attach(airquality)
df_lags <- airquality %>%
  mutate(Temp_lag1 = lag(n = 1L, Temp)) %>%
  na.omit()

ESM_train = data.frame(df_lags[1:81, ])            # Training Observed 75% dataset
ESM_test = data.frame(df_lags[82:nrow(df_lags), ]) # Testing Observed 25% dataset

grid_gaussprRadial = expand.grid(.sigma = c(0.001, 0.01, 0.05, 0.1, 0.5, 1, 2)) # Sigma parameters searching for GPR

# TRAIN MODEL ############################
# Tuning set
library(caret)
set.seed(111)
cvCtrl <- trainControl(
  method ="repeatedcv",
  repeats = 1,
  number = 20,
  allowParallel = TRUE,
  verboseIter = TRUE,
  savePredictions = "final")

# Train (aprox. 4 seconds time-simulation)
attach(ESM_train)
set.seed(111)
system.time(Model_train <- caret::train(Ozone ~  Temp + Temp_lag1,
                                        trControl = cvCtrl,
                                        data = ESM_train,
                                        metric = "MAE", # Using MAE since I intend minimum values are my focus 
                                        preProcess = c("center", "scale"),
                                        method = "gaussprRadial", # Setting RBF kernel function
                                        tuneGrid = grid_gaussprRadial,
                                        maxit = 1000,
                                        linout = 1)) # Regression type

plot(Model_train)
Model_train
ESM_results_train <- Model_train$resample %>% mutate(Model = "") # K-fold Training measures

# Select the interested TRAIN data and arrange them as dataframe
Ozone_Obs_Tr = Model_train$pred$obs
Ozone_sim = Model_train$pred$pred
Resid = Ozone_Obs_Tr - Ozone_sim
train_results = data.frame(Ozone_Obs_Tr,
                           Ozone_sim,
                           Resid)

# Plot Obs x Simulated train results
library(ggplot2)
ggplot(data = train_results, aes(x = Ozone_Obs_Tr, y = Ozone_sim)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1, color = "black")


# TEST MODEL ############################
# From "ESM_test" dataframe, we predict ESM Ozone time series, adding it in "ESM_forecasted" dataframe
ESM_forecasted = ESM_test %>%                                              
  mutate(Ozone_Pred = predict(Model_train, newdata = ESM_test, variance.model = TRUE))
str(ESM_forecasted)

# Select the interested TEST data and arrange them as a dataframe
Ozone_Obs = ESM_forecasted$Ozone
Ozone_Pred = ESM_forecasted$Ozone_Pred

# Plot Obs x Predicted TEST results
ggplot(data = ESM_forecasted, aes(x = Ozone_Obs, y = Ozone_Pred)) +
  geom_point() +
  geom_abline(intercept = 0, slope = 1, color = "black")


# Model performance #####
library(hydroGOF)
gof_TR = gof(Ozone_sim, Ozone_Obs_Tr)
gof_TEST = gof(Ozone_Pred,Ozone_Obs)
Performances = data.frame(
                          Train = gof_TR,
                          Test = gof_TEST
                          ); Performances
# Plot the TEST prediction
attach(ESM_forecasted)
plot(Ozone_Obs, type = "l", xlab = "", ylab = "", ylim = range(Ozone_Obs, Ozone_Pred))
lines(Ozone_Pred , col = "coral2", lty = 2, lwd = 2)
legend("top", legend = c("Ozone Obs Test", "Ozone Pred Test"),
       col=c(1, "coral2"), lty = 1:2, cex = 0.7, text.font = 4, inset = 0.01, box.lty=0, lwd = 2)

These last lines generate the following plot:

The next, and last, step would be to extract the prediction intervals, which is based on a gaussian distribution around each prediction point, to plot it together with this last plot.

The caret::kernlab train() appliance returned better prediction than, for instance, just kernlab::gaussprRadial(), or even tgp::bgp() packages. For both of them I could find the prediction interval.

For example, to pick up the prediction intervals via tgp::bgp(), it could be done typing:


    Upper_Bound <- Ozone_Pred$ZZ.q2 #Ozone_Pred - 2 * sigma^2 
    Lower_Bound <- Ozone_Pred$ZZ.q1 #Ozone_Pred + 2 * sigma^2

Therefore, via caret::kernlab train(), I hope the required standard deviations could be found typing something as

Model_train$...

or maybe, with

Ozone_Pred$...

Moreover, at link: https://stats.stackexchange.com/questions/414079/can-mad-median-absolute-deviation-or-mae-mean-absolute-error-be-used-to-calc, Stephan Kolassa author explained that we could estimate the prediction intervals through MAE, or even RMSE. But I didn't understand if this is my point, since the MAE I got is just the comparison between Obs x Predicted Ozone data, in this example.

Please, this solution is very important to me! I think I am near to obtain my main results, but I don't know anymore how to try. Thanks a lot, friends!

Are you wanting the prediction interval for new observations or the credible region on the predictive distribution of the latent means? — duckmayr, Jul 25 '19 at 19:52
At first, I would like to obtain the prediction interval of these new observations that I called "Test dataset". However, I am a newby in Gaussian Process Regression. Therefore, maybe, my concept of prediction interval is wrong related to its application in the GPR, and it makes sense if I say I want the credible region on the predictive distribution of the latent means, just as you wrote, duckmayr. — Rafael-Pedrollo-de-Paes, Jul 25 '19 at 20:13
I'm not too knowledgeable about the `caret` framework, but confidence intervals and predictions intervals for GP regression are super easy. I'll try to post something basic later, but in the mean time, you may want to this out [this Cross Validated answer](https://stats.stackexchange.com/a/250191/184400) that succinctly describes how to obtain these intervals — duckmayr, Jul 25 '19 at 20:23
Thank you very much, duckmayr! The explantion at the link you recomended seems to be very didactic. I've been studying the Rasmussen and Williams' book, but my evolution has been very slowly, since I feel I'm 'alone in the dark'. For aught I know, caret allows work with a suit of machine learning packages, including the kernlab, aiming to tune some parameters through some tools like cross-validation one. Caret helped me to obtain better results with regression applied to my research. What has killing me for now is the uncertainty prediction! I'll look forward to your answer. — Rafael-Pedrollo-de-Paes, Jul 26 '19 at 20:03

score 0 · Answer 1 · answered Jul 26 '19 at 20:26

I don't really know how the caret framework works, but getting a prediction interval for a GP regression with a Gaussian likelihood is easy enough to do manually.

First we just need a function for the squared exponential kernel, also called the radial basis function kernel, which is what you were using. sf here is the scale factor (unused in the kernlab implementation), and ell is the length scale, called sigma in the kernlab implementation:

covSEiso <- function(x1, x2 = x1, sf = 1.0, ell = 1.0) {
    sf     <- sf^2
    ell    <- -0.5 * (1 / (ell^2))
    n      <- nrow(x1)
    m      <- nrow(x2)
    d      <- ncol(x1)
    result <- matrix(0, nrow = n, ncol = m)
    for ( j in 1:m ) {
        for ( i in 1:n ) {
            result[i, j] <- sf * exp(ell * sum((x1[i, ] - x2[j, ])^2))
        }
    }
    return(result)
}

I'm not sure what your code says about which length scale to use; below I will use a length scale of 25 and scale factor of 50 (obtained via GPML's hyperparameter optimization routines). Then we use the covSEiso() function above to get the relevant covariances, and the rest is application of basic Gaussian identities. I would refer you to Chapter 2 of Rasmussen and Williams (2006) (graciously provided for free online).

data(airquality)
library(tidyverse)
library(magrittr)
df_lags <- airquality %>%
    mutate(Temp_lag1 = lag(n = 1L, Temp)) %>%
    na.omit()
ESM_train <- data.frame(df_lags[1:81, ])             # Training Data 75% dataset
ESM_test  <- data.frame(df_lags[82:nrow(df_lags), ]) # Testing  Data 25% dataset
## For convenience I'll define separately the training and test inputs
X <- ESM_train[ , c("Temp", "Temp_lag1")]
Xstar <- ESM_test[ , c("Temp", "Temp_lag1")]
## Get the kernel manually
K <- covSEiso(X, ell = 25, sf = 50)
## We also need covariance between the test cases
Kstar <- covSEiso(Xstar, X, ell = 25, sf = 50)
Ktest <- covSEiso(Xstar, ell = 25, sf = 50)
## Now the 95% credible region for the posterior is
predictive_mean <- Kstar %*% solve(K + diag(nrow(K))) %*% ESM_train$Ozone
predictive_var  <- Ktest - (Kstar %*% solve(K + diag(nrow(K))) %*% t(Kstar))
## Then for the prediction interval we only need to add the observation noise
z <- sqrt(diag(predictive_var)) + 25
interval_high <- predictive_mean + 2 * z
interval_low <- predictive_mean - 2 * z

Then we can check out the prediction intervals

This all is pretty easy to do via my gplmr package (available on GitHub) which can call GPML from R if you have Octave installed:

data(airquality)
library(tidyverse)
library(magrittr)
library(gpmlr)
df_lags <- airquality %>%
    mutate(Temp_lag1 = lag(n = 1L, Temp)) %>%
    na.omit()
ESM_train <- data.frame(df_lags[1:81, ])             # Training Data 75% dataset
ESM_test  <- data.frame(df_lags[82:nrow(df_lags), ]) # Testing  Data 25% dataset
X <-  as.matrix(ESM_train[ , c("Temp", "Temp_lag1")])
y <- ESM_train$Ozone
Xs <- as.matrix(ESM_test[ , c("Temp", "Temp_lag1")])
ys <- ESM_test$Ozone
hyp0 <- list(mean = numeric(), cov = c(0, 0), lik = 0)
hyp  <- set_hyperparameters(hyp0, "infExact", "meanZero", "covSEiso","likGauss",
                            X, y)
gp_res <- gp(hyp, "infExact", "meanZero", "covSEiso", "likGauss", X, y, Xs, ys)
predictive_mean <- gp_res$YMU
interval_high   <- gp_res$YMU + 2 * sqrt(gp_res$YS2)
interval_low    <- gp_res$YMU - 2 * sqrt(gp_res$YS2)

Then just plot the predictions, as above:

plot(NULL, xlab = "", ylab = "", xaxt = "n", yaxt = "n",
     xlim = range(ESM_test$Temp), ylim = range(c(interval_high, interval_low)))
axis(1, tick = FALSE, line = -0.75)
axis(2, tick = FALSE, line = -0.75)
mtext("Temp", 1, 1.5)
mtext("Ozone", 2, 1.5)
idx <- order(ESM_test$Temp)
polygon(c(ESM_test$Temp[idx], rev(ESM_test$Temp[idx])),
        c(interval_high[idx], rev(interval_low[idx])),
        border = NA, col = "#80808080")
lines(ESM_test$Temp[idx], predictive_mean[idx])
points(ESM_test$Temp, ESM_test$Ozone, pch = 19)
plot(NULL, xlab = "", ylab = "", xaxt = "n", yaxt = "n",
     xlim = range(ESM_test$Temp_lag1), ylim = range(c(interval_high, interval_low)))
axis(1, tick = FALSE, line = -0.75)
axis(2, tick = FALSE, line = -0.75)
mtext("Temp_lag1", 1, 1.5)
mtext("Ozone", 2, 1.5)
idx <- order(ESM_test$Temp_lag1)
polygon(c(ESM_test$Temp_lag1[idx], rev(ESM_test$Temp_lag1[idx])),
        c(interval_high[idx], rev(interval_low[idx])),
        border = NA, col = "#80808080")
lines(ESM_test$Temp_lag1[idx], predictive_mean[idx])
points(ESM_test$Temp_lag1, ESM_test$Ozone, pch = 19)

Thank you for sharing your knowledge, Duckmayr! Could you explain me better about the tuning of scale factor "sf" and lenght scale "ell" parameters? For what I understood, the "ell" is equivalent to "l" at Rasmussen and Williams' (2006) book, as long as the "sigma" parameter at kernlab package. When we run the code above, it sets that sigma = 0.5 for lesser MAE's, if we run "Model_train" at the code. However, you'd wrote that you used some GPML optimization routine to find the best hyperparameters for this exercise, and it was sf = 25 and ell = 50. Am I confusing myself about the scales? — Rafael-Pedrollo-de-Paes, Aug 27 '19 at 20:33
@RafaelPedrollodePaes You're right about the translation from my code to Rasmussen and Williams (2006). I'm not sure about how `caret` does things. The hyperparameter tuning optimization routine I called was from Rasmussen and Williams' software, GPML. "GPML" stands for Gaussian Processes for Machine Learning. It is a Matlab toolkit. I've built an R interface to it if you have Octave (an open-source code-compatible Matlab clone). So, I don't really know where `caret` gets the 0.5 number from or if it is indeed the hyperparameters I'm talking about, but otherwise you have it right — duckmayr, Aug 27 '19 at 20:55

How to extract the Prediction Intervals of a Gaussian Process Regression via caret kernlab package?

1 Answers1