I am trying to fit a smoothed surface of z
against x
and y
using formula z ~ s(x, y)
with gam
function
in mgcv
package. My goal is to predict response z
based on new values of x
and y
.
In my real situation, z
should be a positive number negative z
would be meaningless. However, the predicted z
s
are sometimes negative. It seems that for some region, there is not enough points in the training data to estimate z
accurately.
My question is: Is there a way to specifiy a lower bound of z
during smooth in gam
so that later I won't get negative z
s with predict
?
Below is a minimal example that reproduces this issue.
library(mgcv)
x <- seq(0.1, 1, by = 0.01)
y <- seq(0.1, 1, by = 0.01)
dtt <- expand.grid(x = x, y = y)
set.seed(123)
dtt$xp <- dtt$x + rnorm(nrow(dtt)) / 100
dtt$yp <- dtt$y + rnorm(nrow(dtt)) / 100
dtt$z <- 1 / (dtt$xp^2 + dtt$yp^2)
m <- sample.int(nrow(dtt), 3000)
dtt.train <- dtt[m, ]
dtt.test <- dtt[!(1:nrow(dtt) %in% m), ]
fit <- gam(z ~ s(x, y), data = dtt.train)
p <- predict(fit, newdata = dtt.test)
plot(dtt.test$z, p, xlab = 'Real', ylab = 'Predicted', pch = 19, col = 1 + (p < 0))
abline(h = 0, v = 0)
As you can see, for the red points. the real values are positive but the predicted values are negative.