0

my dataset can be found here: https://raw.githubusercontent.com/yuliaUU/test/main/test.csv

library(gamlss)
library(tidyverse)
data_final<- read_csv("https://raw.githubusercontent.com/yuliaUU/test/main/test.csv")

# Normal model with log transformation 
model_1 <-  gamlss(log(Abundance) ~ salinity*avrg_dep, data = data_final, family = NO())
# log normal model 
model_2 <- gamlss(Abundance ~  salinity*avrg_dep, data = data_final,  family = LOGNO())
#  Model with inverse gaussian distribution
model_3 <- gamlss(Abundance ~ salinity*avrg_dep, data = data_final,  family = IG())
# Gamma model
model_4 <- gamlss(Abundance ~ salinity*avrg_dep,  data = data_final, family = GA())

I want to use GAIC to compare between the models, but GAIC value for 1st model is far off from the rest

I read that:

To ensure that the GAIC of the linear model with the transformed response was comparable, the transformed log-likelihood multiplied by the Jacobian was used, and the GAIC was re-calculated manually.

I tried to do it the following way:

Jacobian <- 1/abs(data_final$Abundance)
# Calculate fitted values (on the log scale)
fitted_values_log <- predict(model_1)

# Calculate residuals manually (on the log scale)
residuals_transformed <- log(data_final$Abundance) - fitted_values_log

# Calculate standard deviation of the residuals
sd_residuals_transformed <- sd(residuals_transformed)

# Transformed log-likelihood calculation
log_likelihood_transformed <- sum(dnorm(log(data_final$Abundance), mean=fitted_values_log, sd=sd_residuals_transformed, log=TRUE) * Jacobian)

# Calculate degrees of freedom: number of parameters in the model
df <- length(coef(model_1))

# Manually calculate GAIC
GAIC_transformed <- -2 * log_likelihood_transformed + 2 * df
GAIC_transformed

but the value produced is sooo off, so I think I made a mistake somewhere

user20650
  • 24,654
  • 5
  • 56
  • 91
yuliaUU
  • 1,581
  • 2
  • 12
  • 33

2 Answers2

0

The easiest answer is to just fit the lognormal explicitly in gamlss, i.e. family=LOGNO

A more general answer, that applies to distributions other than the normal distribution on the real line, e.g. TF, is to create the corresponding logTF distribution:

gen.Family ("TF", type="log")

and then in the gamlss fit use

family=logTF

Robert
  • 36
  • 2
  • i already have LOGNO (model 2), i want to compare model1 and model 2 GAIC- in theory both should be identical ( at least I think so), but my GAIC are not – yuliaUU Jun 12 '23 at 23:02
0
# Model 1: Log-transformed response with lm()
model_1 <- lm(log(Abundance) ~ salinity * avrg_dep, data = data_final)

# Calculate log-likelihood of the model
logL <- logLik(model_1)

# Adjust the log-likelihood using the Jacobian for a log transformation
adjusted_logL <- logL + sum(log(1/data_final$Abundance))

# Count the number of parameters in the model (including intercept)
k <- length(coef(model_1))

# Get sample size
n <- length(model_1$residuals)

# Compute GAIC with adjusted log-likelihood
GAIC_adjusted <- -2*adjusted_logL + 2*k + 2*k*(k+1)/(n-k-1)

print(GAIC_adjusted)
yuliaUU
  • 1,581
  • 2
  • 12
  • 33