1

I'm working with ecological data, where I have used cameras to sample animal detections (converted to biomass) and run various models to identify the best fitting model, chosen through looking at diagnostic plots, AIC and parameter effect size. The model is a gamma GLM (due to biomass having a continuous response), with a log link. The chosen model has the predictor variables of distance to water ("dist_water") and distance to forest patch ("dist_F3"). This is the model summary:

    glm(formula = RAI_biomass ~ Dist_water.std + Dist_F3.std, family = Gamma(link = "log"), 
    data = biomass_RAI)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3835  -1.0611  -0.3937   0.4355   1.5923  

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      5.3577     0.2049  26.143 2.33e-16 ***
Dist_water.std  -0.7531     0.2168  -3.474  0.00254 ** 
Dist_F3.std      0.5831     0.2168   2.689  0.01452 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Gamma family taken to be 0.9239696)

    Null deviance: 41.231  on 21  degrees of freedom
Residual deviance: 24.232  on 19  degrees of freedom
AIC: 287.98

Number of Fisher Scoring iterations: 7

The covariates were standardised prior to running the model. What I need to do now is to back-transform this model into natural units in order to predict biomass values at unsampled locations (in this case, farms). I made a table of each farm and their respective distance to water, and distance to forest patch. I thought the way to do this would be to use the exp(coef(biomass_glm)), but when I did this the dist_water.std coefficient changed direction and became positive.

exp(coef(biomass_glm8))
## Intercept       Dist_water.std     Dist_F3.std 
## 212.2369519      0.4709015         1.7915026

To me this seems problematic, as in the original GLM, an increasing distance to water meant a decrease in biomass (this makes sense) - but now we are seeing the opposite? The calculated biomass response had a very narrow range, from 210.97-218.9331 (for comparison, in the original data, biomass ranged from 3.04-2227.99.

I then tried to take the exponent of the entire model, without taking the exponent of each coefficient individually:

farms$biomass_est2 <- exp(5.3577 + (-0.7531*farms$Farm_dist_water_std) + (0.5831*farms$Farm_dist_F3_std))

and this gave me a new biomass response that makes a bit more sense, i.e. more variation given the variation in the two covariates (2.93-1088.84).

I then tried converting the coefficient estimates by doing e^B - 1, which gave again different results, although most similar to the ones obtained through exp(coef(biomass_glm)):

(e^(-0.7531))-1  #dist_water = -0.5290955
(e^(0.5831))-1   #dist_F3 = 0.7915837
(e^(5.3577))-1   #intercept = 211.2362

My question is, why are these estimates different, and what is the best way to take this gamma GLM with a log link and convert it into a format that can be used to calculate predicted values? Any help would be greatly appreciated!

  • Why not just use the predict function? – Dason Mar 06 '19 at 01:09
  • Does this take into account the difference in scales between the glm with the log link and the new data (in the natural scale) I am trying to get predicted values for? – Aisha Uduman Mar 06 '19 at 01:50
  • @AishaUduman - I don't think so, no. You would want to scale the new data using the mean and standard deviation of the original data, and then use that in the `predict()` function. Also, I think it does make sense that the coefficient that is negative on the log scale is positive when you exponentiate it - you can interpret the exponentiated coefficient as a multiplier: https://stats.stackexchange.com/questions/96972/how-to-interpret-parameters-in-glm-with-family-gamma – Marius Mar 06 '19 at 02:18
  • Made a new dataframe with dist_water and dist_F3 columns for the farms (which I need a predicted biomass estimate for), standardised by the mean and sd of the original data: `farms_predict <- data.frame("Farm" = farms$Farm, "Farm_dist_water_std" = farms$Farm_dist_water_std, "Farm_dist_F3_std" = farms$Farm_dist_F3_std) farms_predict$biomass <- predict.glm(biomass_glm8, newdata = farms_predict, type = "response")` But got: Error in eval(predvars, data, env) : object 'Dist_water.std' not found Any idea why this is happening? – Aisha Uduman Mar 06 '19 at 19:21
  • Ignore -- got it to work! Thanks :) – Aisha Uduman Mar 06 '19 at 22:32
  • if there is a solution, it would be good to post it, even if you answer your own post. – kdarras Nov 07 '19 at 10:08

0 Answers0