4

My current problem is to calculate the variance explained by the different variables of a general additive model (GAM) with R.

I followed the explanation given by Wood here : https://stat.ethz.ch/pipermail/r-help/2007-October/142743.html

But I would like to do it with three variables. I tried this :

library(mgcv)

set.seed(0)
n<-400
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1) 
x3 <- runif(n, 0, 1) 

f1 <- function(x) exp(2 * x) - 3.75887
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f3 <- function(x) 0.008*x^2 - 1.8*x + 874
f <- f1(x1) + f2(x2) + f3(x3)
e <- rnorm(n, 0, 2)
y <- f + e

b <- gam(y ~ s(x1, k = 3)+s(x2, k = 3)+ s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))

b0 <- gam(y~1)

(deviance(b1)-deviance(b))/deviance(b0)
(deviance(b2)-deviance(b))/deviance(b0)
(deviance(b3)-deviance(b))/deviance(b0)

But I don't understand results. For example, the model with only x1 and x2 has a deviance smaller than deviance with the three explanatory variable.

Does the method I used to extract variance explained by variable with three variables is correct?

Does it mean that there is a confounding effect in the global model? Or is there another explanation?

Thanks a lot.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248

1 Answers1

6

You did something wrong here:

b <- gam(y ~ s(x1, k = 3) + s(x2, k = 3) + s(x3, k = 3))
b3 <- gam(y ~ s(x1) + s(x2), sp = c(b$sp[1], b$sp[2]))
b2 <- gam(y ~ s(x1) + s(x3), sp = c(b$sp[1], b$sp[3]))
b1 <- gam(y ~ s(x2) + s(x3), sp = c(b$sp[2], b$sp[3]))

Why did you set k = 3 in the first line, while not setting k = 3 for the rest? Without specifying k, s() will take default value k = 10. Now you get a problem: b1, b2, b3 are not nested in b.

In Simon Wood's original example, he left k unspecified, so that k=10 is taken for all s(). In fact, you can vary k values, but you must gurantee that you always have the same k for the same covariate (to ensure nesting). For example, you can do:

b <- gam(y ~ s(x1, k = 4) + s(x2, k = 6) + s(x3, k = 3))
b3 <- gam(y ~ s(x1, k = 4) + s(x2, k = 6), sp = c(b$sp[1], b$sp[2]))  ## droping s(x3) from b
b2 <- gam(y ~ s(x1, k = 4) + s(x3, k = 3), sp = c(b$sp[1], b$sp[3]))  ## droping s(x2) from b
b1 <- gam(y ~ s(x2, k = 6) + s(x3, k = 3), sp = c(b$sp[2], b$sp[3]))  ## droping s(x1) from b

Then let's do:

(deviance(b1)-deviance(b))/deviance(b0)
# [1] 0.2073421
(deviance(b2)-deviance(b))/deviance(b0)
# [1] 0.4323154
(deviance(b3)-deviance(b))/deviance(b0)
# [1] 0.02094997

The positive values imply that dropping any model term will inflate the deviance, which is sensible as our true model have all three terms.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • Thanks for your answer, – A. Receveur Jul 22 '16 at 02:22
  • Thanks for your answer, I agree with you. it still has something that I don't understand. Why I need to specify k (wich is the maximum degree of smoothing) if there is not smoothing optimization? For my point of view, specifying "sp = ..." implies that there is no need for the optimization and so no need to set k. – A. Receveur Jul 22 '16 at 02:29