"saturated likelihood may be inaccurate" warning and negative deviance when running betar family in GAM

Question

My code when running the generalized additive model with the betar family is as follow.

libary(mgcv)
b1 <- gam(ssim_exp ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + comparison_type, data = df, family = betar(link = "logit", eps=.Machine$double.eps*1000))

Output

saturated likelihood may be inaccurate
Family: Beta regression(0.434) 
Link function: logit 

Formula:
ssim_exp_scale ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + 
    comparison_type

Parametric coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -0.5572     0.1607  -3.468 0.000524 ***
comparison_typefunctions   2.0598     0.1988  10.362  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                                  edf Ref.df Chi.sq  p-value    
s(stage):comparison_typecomplete    3      3  19.07 0.000265 ***
s(stage):comparison_typefunctions   3      3   0.88 0.830160    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  -0.00757   Deviance explained = -16.4%
-REML = -1035.1  Scale est. = 1         n = 171
saturated likelihood may be inaccuratesaturated likelihood may be inaccurate

I tried decreasing the eps but I still get the same warning "saturated likelihood may be inaccurate" and negative deviance, any idea why? And how to fix this?

For context - I do have some 0s and 1s in the data and my dependent variable is in the form of percentage from 0 - 100%, rescaled to 0 and 1. My dependent variable is a similarity measure like Jaccard similarity - https://www.learndatasci.com/glossary/jaccard-similarity/ .

This is the distribution of the dependent variable of my data

It looks like you could have quite a few 0s and 1s; is that so? If it is, it's unlikely that this family is going to be useful. Instead {brms} could be used as it allows for zero/one inflated beta — Gavin Simpson, Feb 25 '23 at 12:20
Thank you for your comment @GavinSimpson , yes, there are indeed quite a few 0s and 1s. — nerd, Feb 25 '23 at 15:43
In that case, the warning is not surprising; the likelihood with a parameter per observation is unreliable if you have a lot of data values piled up at `.Machine$double.eps*100` and `1 - .Machine$double.eps*100`. You'll need to find a package that can fit zero/one inflated beta models; brms is one. — Gavin Simpson, Feb 27 '23 at 08:32

"saturated likelihood may be inaccurate" warning and negative deviance when running betar family in GAM

0 Answers0