3

I am using gam (from mgcv package in R) to model presence/absence data in 3355 cells of 1x1km (151 presences and 3204 absences). Even though I include a smooth with the spatial locations in the model to address the spatial dependence in my data, the results from a variogram show spatial autocorrelation in the residuals of my gam (range=6000 meters). Since I am modelling a binary response, using a gamm with a correlation structure is not advisable because it "performs poorly with binary data", neither gamm4 because (although is supposed to be appropriate for binary data) it has "no facility for nlme style correlation structures".

The alternative I have found is to fit my model using the function magic from the same mgcv package. Because I found no examples of how to use magic for spatially correlated data I have adapted the ?magic example for temporally correlated data. The results of the output (see below) change the coefficients of the model but does not remove the spatial autocorrelation and the smooth plots show the same effect.

> summary(data)
Object of class SpatialPolygonsDataFrame
Coordinates:
     min    max
x 670000 780000
y 140000 234000
Is projected: TRUE 
proj4string :
[+proj=tmerc +lat_0=0 +lon_0=19 +k=0.9993 +x_0=500000 +y_0=-5300000 +datum=WGS84 +units=m
+no_defs +ellps=WGS84 +towgs84=0,0,0]
Data attributes:
     f_edge          lat              long        dam     
 Min.   : 0.0   Min.   :670500   Min.   :140500   0:3204  
 1st Qu.: 0.0   1st Qu.:722500   1st Qu.:165500   1: 151  
 Median : 8.0   Median :743500   Median :181500           
 Mean   :11.5   Mean   :736992   Mean   :180949           
 3rd Qu.:21.0   3rd Qu.:756500   3rd Qu.:194500           
 Max.   :50.0   Max.   :779500   Max.   :233500    
> gam1x1 <- gam(dam ~ s(lat, long) + s(f_edge),
+                   family = binomial, data=data,
+                   method = "ML", select = TRUE, na.action = "na.fail")
> # generate standadized residuals
> res <- residuals(gam1x1, type = "person")
> # calculate the sample variogram and fit the variogram model
> var <- variogram(res ~ long + lat, data=data)    
> fit <- fit.variogram(var, vgm(c("Sph", "Exp"))); plot(var, fit)  

variogram of the standardized residuals

> # extract the nugget and range from the variogram model to create the correlation structure 
> # to include in the magic formula
> fit
  model     psill    range
1   Nug 0.5324180    0.000
2   Sph 0.2935823 6002.253
> cs1Exp <- corSpher(c(6002.253, 0.5324180), form = ~ long + lat, nugget=TRUE)
> cs1Exp <- Initialize(cs1Exp, dataPOD1x1s)
> # follow instructions from mgcv 'magic' function
> V <- corMatrix(cs1Exp)
> Cv <- chol(V) 
> ## gam ignoring correlation
> b <- gam(dam ~ s(lat, long) + 
+            s(f_edge),
+          family = binomial,
+          data=data,
+          method = "ML",
+          select = TRUE,
+          na.action = "na.fail")
> ## Fit smooth, taking account of *known* correlation...
> w <- solve(t(Cv)) # V^{-1} = w'w
> ## Use `gam' to set up model for fitting...
> G <- gam(dam10_15 ~ s(lat, long) +
+            s(f_edge),
+          family = binomial,
+          data=dataPOD1x1s,
+          method = "ML",
+          select = TRUE,
+          na.action = "na.fail",
+          fit=FALSE)
> ## fit using magic, with weight *matrix*
> mgfit <- magic(G$y,G$X,G$sp,G$S,G$off,rank=G$rank,C=G$C,w=w)
> mg.stuff <- magic.post.proc(G$X,mgfit,w)
> b$edf <- mg.stuff$edf;b$Vp <- mg.stuff$Vb
> b$coefficients <- mgfit$b 
> summary(gam1x1)

Family: binomial 
Link function: logit 

Formula:
dam ~ s(lat, long) + s(f_edge)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -3.6634     0.1338  -27.37   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
               edf Ref.df Chi.sq  p-value    
s(lat,long) 15.208     29  83.35 2.70e-14 ***
s(f_edge)    3.176      9  54.05 2.85e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.0595   Deviance explained = 15.6%
-ML = 551.92  Scale est. = 1         n = 3355
> summary(b)

Family: binomial 
Link function: logit 

Formula:
dam ~ s(lat, long) + s(f_edge)

Parametric coefficients:
     Estimate Std. Error z value Pr(>|z|)    
[1,]  0.06537    0.01131   5.778 7.56e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
              edf Ref.df Chi.sq p-value
s(lat,long) 3.716     29  0.046       1
s(f_edge)   6.753      9  0.295       1

R-sq.(adj) =  0.0618   Deviance explained = 15.6%
-ML = 551.92  Scale est. = 1         n = 3355
plot(b, select=2)
plot(gam1x1, select =2)

smooth effects

> resMag <- residuals.gam(b, type = "person")
> varMag <- variogram(resMag ~ long + lat, data=dataPOD1x1s)
> fitMag <- fit.variogram(varMag, vgm(c("Sph", "Exp"))); plot(varMag, fitMag)
> fitMag
  model     psill    range
1   Nug 0.5324180    0.000
2   Sph 0.2935823 6002.253

variogram of the standardized residuals from the gam model fitted with the magic function

Is something wrong in my script? Is there any other alternative to remove the residuals' spatial autocorrelation from my binomial gam?

Thanks a lot!

  • 1
    If I'd gotten this far I'd go right to the experts at [r-sig-geo](https://stat.ethz.ch/mailman/listinfo/r-sig-geo). All the top spatial developers are there, so a look at the archives also worthwhile. – Chris Apr 09 '20 at 06:41

0 Answers0