5

I am trying to fit a smoothed surface of z against x and y using formula z ~ s(x, y) with gam function in mgcv package. My goal is to predict response z based on new values of x and y.

In my real situation, z should be a positive number negative z would be meaningless. However, the predicted zs are sometimes negative. It seems that for some region, there is not enough points in the training data to estimate z accurately.

My question is: Is there a way to specifiy a lower bound of z during smooth in gam so that later I won't get negative zs with predict?

Below is a minimal example that reproduces this issue.

library(mgcv)

x <- seq(0.1, 1, by = 0.01)
y <- seq(0.1, 1, by = 0.01)
dtt <- expand.grid(x = x, y = y)

set.seed(123)
dtt$xp <- dtt$x + rnorm(nrow(dtt)) / 100
dtt$yp <- dtt$y + rnorm(nrow(dtt)) / 100

dtt$z <- 1 / (dtt$xp^2 + dtt$yp^2)

m <- sample.int(nrow(dtt), 3000)

dtt.train <- dtt[m, ]
dtt.test <- dtt[!(1:nrow(dtt) %in% m), ]

fit <- gam(z ~ s(x, y), data = dtt.train)

p <- predict(fit, newdata = dtt.test)

plot(dtt.test$z, p, xlab = 'Real', ylab = 'Predicted', pch = 19, col = 1 + (p < 0))
abline(h = 0, v = 0)

As you can see, for the red points. the real values are positive but the predicted values are negative.

enter image description here

mt1022
  • 16,834
  • 5
  • 48
  • 71
  • 1
    I think family and link function is one of the options, such as `gam(z ~ s(x, y), data = dtt.train, family = "gaussian"(link = "log"))` – cuttlefish44 Mar 30 '18 at 18:06
  • @cuttlefish44, thanks for the hint. I tried `log` link but the predicuted `z` still have many negative values. It seems that the predicted values are transformed by the link function so that I have to apply the mean function myself to obtain correct `z` values? – mt1022 Mar 31 '18 at 05:45
  • 2
    I figured it out. The default return value of `predict.gam` is linear predicotor. When the link funcitoion is identity, the linear predictor is in the same scale with `z`. When the link is `log`, I have to specify `type = "response"` in `predict.gam` to get preicted `z` values. – mt1022 Mar 31 '18 at 06:00

0 Answers0