0

I have a model that converged, but when I try to check for concurvity (used "multicollinearity" as it's the only working tag) I repeatedly get this error message. It occurs no matter which terms or how many I kick out. My data is 1,500 rows x 23 columns and I can't reproduce the error with a smaller subset of data. I've seen this problem here, but it's solution doesn't seem to work for generalized additive mixed models. I have also tried setting full argument to TRUE and FALSE.

library(mgcv)

mod <- bam(num ~ 
           
           # Parametric terms
           CYR.std * Season +
             
           # Habitat covariates
           sed_depth * ave_hw +
           total_ave_ma +
           s(ave_tt) +
           
           # Structural components
           s(Site, bs = "re") +
           s(Site, fCYR, bs = "re") +
           s(Site, CYR.std, bs = "re") +
           s(Season, fCYR, bs = "re") +
           
           offset(log(area_sampled)), 

         data = toad, 
         method = 'fREML',
         discrete = TRUE,
         family = poisson,
         control = list(trace = TRUE))

Setup complete. Calling fit
Deviance = 226.280118812246 Iterations - 1
Deviance = 502.577417444367 Iterations - 2
Deviance = 576.624536857 Iterations - 3
Deviance = 685.39118817115 Iterations - 4
Deviance = 714.899394181739 Iterations - 5
Deviance = 735.575311982761 Iterations - 6
Deviance = 734.653307725508 Iterations - 7
Deviance = 735.874682895636 Iterations - 8
Deviance = 736.673618482559 Iterations - 9
Deviance = 736.861294501416 Iterations - 10
Deviance = 736.981664035411 Iterations - 11
Deviance = 737.00995853621 Iterations - 12
Deviance = 737.028133131768 Iterations - 13
Deviance = 737.032400712599 Iterations - 14
Deviance = 737.035137664603 Iterations - 15
Deviance = 737.035779699487 Iterations - 16
Deviance = 737.03619105456 Iterations - 17
Deviance = 737.036287496242 Iterations - 18
Deviance = 737.036349254884 Iterations - 19
Fit complete. Finishing gam object.
          user.self sys.self elapsed
initial        0.02     0.00    0.01
gam.setup      0.03     0.00    0.03
pre-fit        0.00     0.00    0.00
fit            7.03     0.34    7.39
finalise       0.00     0.00    0.00

concurvity(mod)

Error in forwardsolve(t(Rt), t(R[1:r, , drop = FALSE])) : singular matrix in 'backsolve'. First zero in diagonal [14]

UPDATE:

Removing the random effect "s(fSeason, fCYR, bs = "re") +..." did the trick, but I don't understand why. I'm also not sure what to do if it's an important part of faithfully representing the survey design (random intercept that captures similarity of observations within a season within a year). Is this a "Cross Validated question" now?

Estimated variance components:

> gratia::variance_comp(mod)
# # A tibble: 5 × 5
component        variance std_dev lower_ci upper_ci
<chr>               <dbl>   <dbl>    <dbl>    <dbl>
1 s(ave_tt)        0.000266  0.0163  0.00471   0.0564
2 s(fSite)         0.0817    0.286   0.0658    1.24  
3 s(fCYR,fSite)    0.839     0.916   0.786     1.07  
4 s(CYR.std,fSite) 0.00433   0.0658  0.0427    0.101 
5 s(fSeason,fCYR)  0.667     0.816   0.565     1.18 

example data:

> x <- toad[sample(nrow(toad), 1), ]
> dput(x)
structure(list(CYR = 2010L, Season = structure(2L, levels = c("DRY", 
"WET"), class = "factor"), Site = structure(34L, levels = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47"), class = "factor"), area_sampled = 3L, Latitude = 25.48503, 
    Longitude = -80.33982, num = 0L, den = 0, occur = 0L, temp = 32.3, 
    DO = 2.7, sal = 28.78, group = "polyhaline", water_depth = 85L, 
    sed_depth = 54, ave_sav = 36, ave_tt = 0, ave_hw = 22.5, 
    ave_sf = 0, ave_rm = 0, total_ave_ma = 0.1, fCYR = structure(4L, levels = c("2007", 
    "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015", 
    "2016", "2017", "2018", "2019", "2020", "2021", "2022"), class = "factor"), 
    CYR.std = 3L), row.names = 130183L, class = "data.frame")
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Nate
  • 411
  • 2
  • 10
  • 1
    It sounds like your model is too complex for the data; singular matrices are a sign that you're asking too much from the data – Gavin Simpson Aug 03 '23 at 10:21
  • @GavinSimpson Thank you. I found out that kicking out the "s(Season, fCYR, bs = "re") +" random effect term made the error go away. However, I'm not sure how to handle a situation where one needs to remove a term that seems necessary (captures correlation between obs within the same year and season). – Nate Aug 03 '23 at 12:49
  • Just because a term seems necessary doesn't mean that you have the data to estimate it its effect precisely given the sample of data you have to hand. As you're just computing concurvity I doubt it matters too much, but if you have other problems with model summaries etc then you might want to consider simplifying the model. For example, did you look to see what the estimated variance component of that ranef smooth you removed was? Try `gratia::variance_comp(mod)` to see if it was (numerically) zero variance (or if any terms had wide confidence intervals. – Gavin Simpson Aug 03 '23 at 14:38
  • Oh, I didn't, but I just updated my post to include it. Thank you for that, I wasn't aware of this step. The variance of that random effect was above zero (2nd highest with seemingly large confidence intervals). – Nate Aug 03 '23 at 15:01
  • The s(fSite) random intercept has a really wide C.I. – Nate Aug 03 '23 at 15:10

0 Answers0