The issues you saw is not about GAM. You have used gam
to fit a parametric model, in which case gam
behaves almost as same as lm
. To answer your questions, it is sufficient to focus on the linear regression case. What happens to a linear model will happens to GLMs and GAMs, too. Here is how we can reproduce the issue with lm
:
yy <- c(5,2,8,9)
xx <- 1:4
wgts <- c(3,2,4,1)
yy2 <- rep(yy,wgts)
xx2 <- rep(xx,wgts)
fit1 <- lm(yy2 ~ xx2)
fit2 <- lm(yy ~ xx, weights = wgts)
fit3 <- lm(yy ~ xx, weights = wgts/mean(wgts))
summary1 <- summary(fit1)
summary2 <- summary(fit2)
summary3 <- summary(fit3)
pred1 <- predict(fit1, list(xx2 = xx), interval = "confidence", se.fit = TRUE)
pred2 <- predict(fit2, list(xx = xx), interval = "confidence", se.fit = TRUE)
pred3 <- predict(fit3, list(xx = xx), interval = "confidence", se.fit = TRUE)
All models have the same regression coefficients, but other results may differ. You asked:
- For weighted regression
fit2
and fit3
, why is almost everything the same except residual standard error?
- Why is weighted regression (
fit2
or fit3
) not equivalent to ordinary regression with ties?
Your first question is about the scaling invariance of weight least squares to weights. Here is a brief summary I made:

If we rescale W by an arbitrary positive value, only residual standard error and unscaled covariance will change. Such change does not imply a different, non-equivalent model. In fact, everything related to prediction is not affected. In weighted regression, don't just look at sigma2
; it is just a marginal variance. What is really of interest is the gross variance after multiplying weights. If you divide your weights by 2, you will find sigma2
doubles, but you still get the same result when multiplying them together.
summary2$coef
summary3$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.128713 3.128697 0.6803832 0.5664609
#xx 1.683168 1.246503 1.3503125 0.3094222
pred2
pred3
#$fit
# fit lwr upr
#1 3.811881 -5.0008685 12.62463
#2 5.495050 -0.1299942 11.12009
#3 7.178218 0.6095820 13.74685
#4 8.861386 -1.7302209 19.45299
#
#$se.fit
# 1 2 3 4
#2.048213 1.307343 1.526648 2.461646
#
#$df
#[1] 2
#
#$residual.scale ## for `pred2`
#[1] 3.961448
#
#$residual.scale ## for `pred3`
#[1] 2.50544
Your second question is about the meaning of weights. Weights is used to model heteroscedastic response to overcome leverage effect in ordinary least square regression. Weights are proportional to reciprocal variance: You give bigger weights to data with smaller expected errors. Weights can be non-integer, so it does not have a naturual explanation in terms of repeated data. Thus, what is written in mgcv
package is not rigorously correct.
The real difference between fit1
and fit2
? is the degree of freedom. Check the above table for (n - p)
. n
is the number of data you have, while p
is the number of non-NA
coefficients, so n - p
is the residual degree of freedom. For both models we have p = 2
(intercept and slope), but for fit1
we have n = 10
while for fit2
we have n = 4
. This has dramatic effect on inference, as now standard errors for coefficients and predictions (hence confidence intervals) will differ. These two models are far from being equivalent.
summary1$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.128713 1.5643486 1.360766 0.21068210
#xx2 1.683168 0.6232514 2.700625 0.02704784
summary2$coef
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 2.128713 3.128697 0.6803832 0.5664609
#xx 1.683168 1.246503 1.3503125 0.3094222
pred1
#$fit
# fit lwr upr
#1 3.811881 1.450287 6.173475
#2 5.495050 3.987680 7.002419
#3 7.178218 5.417990 8.938446
#4 8.861386 6.023103 11.699669
#
#$se.fit
# 1 2 3 4
#1.0241066 0.6536716 0.7633240 1.2308229
#
#$df # note, this is `10 - 2 = 8`
#[1] 8
#
#$residual.scale
#[1] 1.980724
pred2
#$fit
# fit lwr upr
#1 3.811881 -5.0008685 12.62463
#2 5.495050 -0.1299942 11.12009
#3 7.178218 0.6095820 13.74685
#4 8.861386 -1.7302209 19.45299
#
#$se.fit
# 1 2 3 4
#2.048213 1.307343 1.526648 2.461646
#
#$df # note, this is `4 - 2 = 2`
#[1] 2
#
#$residual.scale ## for `pred2`
#[1] 3.961448