I am quite new to the R world and have been battling through some issues with my Generalized Linear Model (GLM) output data in an attempt to run a General Linear Hypothesis Test (GLHT) for different levels of interactions.
My main question is: Is there a way to write a GLM formula to leave out a single interaction in order to run GLHT stats on the model?
My data structure is as follows (and is located in a data.frame as factors prior to running the GLM).
invSimpson - dependant variable - decimal numbers
collection_method - independent variable with 4 levels: "BT, CSL, KSL, SW"
site_code - independent variable with 4 levels: "M0, M4, M7, M9"
I am modifying a tried and true SOP which is where I pulled the below commands from. The script worked well for a similar data structure, but that data structure wasn't missing a single interaction as my data is.
The main issue is arising from the fact that CSL was never collected at site M7 so the CSL:M7 interaction is absent. There are no NA's in the input data for the GLM, but once the GLM is run, I end up with NA's in the interaction between CSL:M7.
> mod <- glm(invSimpson ~ collection_method * site_code+0,
+ data = bac_invs, family = Gamma(link = 'log'))
> summary(mod)
Call:
glm(formula = invSimpson ~ collection_method * site_code + 0,
family = Gamma(link = "log"), data = bac_invs)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.9393 -0.5605 -0.1658 0.2258 2.3786
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
collection_methodSW 3.53695 0.11880 29.772 < 2e-16 ***
collection_methodBT 4.02590 0.15068 26.719 < 2e-16 ***
collection_methodKSL 5.56846 0.20860 26.694 < 2e-16 ***
collection_methodCSL 3.78808 0.13193 28.712 < 2e-16 ***
site_codeM4 -0.22633 0.16293 -1.389 0.16556
site_codeM7 0.11395 0.16917 0.674 0.50097
site_codeM9 0.08772 0.15882 0.552 0.58103
collection_methodBT:site_codeM4 -0.04652 0.25818 -0.180 0.85711
collection_methodKSL:site_codeM4 -0.04733 0.29227 -0.162 0.87144
collection_methodCSL:site_codeM4 -0.56047 0.36192 -1.549 0.12226
collection_methodBT:site_codeM7 -0.62172 0.26583 -2.339 0.01983 *
collection_methodKSL:site_codeM7 -0.38977 0.29923 -1.303 0.19346
collection_methodCSL:site_codeM7 NA NA NA NA
collection_methodBT:site_codeM9 -0.79353 0.25678 -3.090 0.00214 **
collection_methodKSL:site_codeM9 -0.27627 0.30595 -0.903 0.36707
collection_methodCSL:site_codeM9 0.39826 0.26766 1.488 0.13754
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.5221921)
Null deviance: 62031.44 on 422 degrees of freedom
Residual deviance: 196.78 on 407 degrees of freedom
AIC: 4134.1
Number of Fisher Scoring iterations: 6
The next step that I would like to take (based on the SOP that I am following) is to run this command:
> mod_glht <- glht(mod, linfct = mcp(collection_method = c('SW - BT = 0',
+ 'SW - KSL = 0',
+ 'SW - CSL = 0',
+ 'BT - KSL = 0',
+ 'BT - CSL = 0',
+ 'KSL - CSL = 0'),
+ site_code = c('M0 - M4 = 0',
+ 'M0 - M7 = 0',
+ 'M0 - M9 = 0',
+ 'M4 - M7 = 0',
+ 'M4 - M9 = 0',
+ 'M7 - M9 = 0'),
+ interaction_average = TRUE))
Error in modelparm.default(model, ...) :
dimensions of coefficients and covariance matrix don't match
In addition: Warning messages:
1: In mcp2matrix(model, linfct = linfct) :
covariate interactions found -- default contrast might be inappropriate
2: In mcp2matrix(model, linfct = linfct) :
covariate interactions found -- default contrast might be inappropriate
I have come to learn that the above error and warning messages arise from the missing (NA) data in the CSL:M7 interaction line from the GLM.
There are a number of suggestions on how to deal with this on stack overflow, none of which have worked for me.
1) I have tried NA.omit after producing the GLM, which does not do anything to the model. It seems that this is only effective for input data (in which no NA's exist) and not useful in editing a GLM output.
2) I have also tried editing the GLM model matrix as outlined here: https://stats.stackexchange.com/questions/141792/how-to-compare-nested-factor-levels-to-mean-not-to-first-level-and-how-to-test
When I do so, it is successful in pulling out the CSL:M7 interaction, but then, I am still unable to run the GLHT on the edited model matrix and get this error:
Error in mcp2matrix2(model, linfct = linfct, interaction_average = ia, :
Variable(s) ‘Mcollection_method’‘Msite_code’ have been specified in ‘linfct’ but cannot be found in ‘model’
3) I also tried the update function but it turned out you could only “update” to remove the factors collection_method:site_code, and not categories within the factors – such as collection_methodCSL:site_codeM7.
Any/all suggestions would be so helpful. I am very new to this, so if I am doing something wrong or if there is a clear/obvious fix, please let me know.
If any ideas of how to address the NA in the GLM issue come to mind, please feel free to share.
Thanks in advance!