0

My code when running the generalized additive model with the betar family is as follow.

libary(mgcv)
b1 <- gam(ssim_exp ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + comparison_type, data = df, family = betar(link = "logit", eps=.Machine$double.eps*1000))

Output

saturated likelihood may be inaccurate
Family: Beta regression(0.434) 
Link function: logit 

Formula:
ssim_exp_scale ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + 
    comparison_type

Parametric coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)               -0.5572     0.1607  -3.468 0.000524 ***
comparison_typefunctions   2.0598     0.1988  10.362  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                                  edf Ref.df Chi.sq  p-value    
s(stage):comparison_typecomplete    3      3  19.07 0.000265 ***
s(stage):comparison_typefunctions   3      3   0.88 0.830160    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  -0.00757   Deviance explained = -16.4%
-REML = -1035.1  Scale est. = 1         n = 171
saturated likelihood may be inaccuratesaturated likelihood may be inaccurate

I tried decreasing the eps but I still get the same warning "saturated likelihood may be inaccurate" and negative deviance, any idea why? And how to fix this?

For context - I do have some 0s and 1s in the data and my dependent variable is in the form of percentage from 0 - 100%, rescaled to 0 and 1. My dependent variable is a similarity measure like Jaccard similarity - https://www.learndatasci.com/glossary/jaccard-similarity/ .

This is the distribution of the dependent variable of my data

enter image description here

nerd
  • 473
  • 5
  • 15
  • 1
    It looks like you could have quite a few 0s and 1s; is that so? If it is, it's unlikely that this family is going to be useful. Instead {brms} could be used as it allows for zero/one inflated beta – Gavin Simpson Feb 25 '23 at 12:20
  • Thank you for your comment @GavinSimpson , yes, there are indeed quite a few 0s and 1s. – nerd Feb 25 '23 at 15:43
  • 1
    In that case, the warning is not surprising; the likelihood with a parameter per observation is unreliable if you have a lot of data values piled up at `.Machine$double.eps*100` and `1 - .Machine$double.eps*100`. You'll need to find a package that can fit zero/one inflated beta models; brms is one. – Gavin Simpson Feb 27 '23 at 08:32

0 Answers0