1

I am quite new to the R world and have been battling through some issues with my Generalized Linear Model (GLM) output data in an attempt to run a General Linear Hypothesis Test (GLHT) for different levels of interactions.

My main question is: Is there a way to write a GLM formula to leave out a single interaction in order to run GLHT stats on the model?

My data structure is as follows (and is located in a data.frame as factors prior to running the GLM).

invSimpson - dependant variable - decimal numbers
collection_method - independent variable with 4 levels: "BT, CSL, KSL, SW"
site_code - independent variable with 4 levels: "M0, M4, M7, M9"

I am modifying a tried and true SOP which is where I pulled the below commands from. The script worked well for a similar data structure, but that data structure wasn't missing a single interaction as my data is.

The main issue is arising from the fact that CSL was never collected at site M7 so the CSL:M7 interaction is absent. There are no NA's in the input data for the GLM, but once the GLM is run, I end up with NA's in the interaction between CSL:M7.

> mod <- glm(invSimpson ~ collection_method * site_code+0,
+            data = bac_invs, family = Gamma(link = 'log'))
> summary(mod)

Call:
glm(formula = invSimpson ~ collection_method * site_code + 0, 
    family = Gamma(link = "log"), data = bac_invs)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9393  -0.5605  -0.1658   0.2258   2.3786  

Coefficients: (1 not defined because of singularities)
                                 Estimate Std. Error t value Pr(>|t|)    
collection_methodSW               3.53695    0.11880  29.772  < 2e-16 ***
collection_methodBT               4.02590    0.15068  26.719  < 2e-16 ***
collection_methodKSL              5.56846    0.20860  26.694  < 2e-16 ***
collection_methodCSL              3.78808    0.13193  28.712  < 2e-16 ***
site_codeM4                      -0.22633    0.16293  -1.389  0.16556    
site_codeM7                       0.11395    0.16917   0.674  0.50097    
site_codeM9                       0.08772    0.15882   0.552  0.58103    
collection_methodBT:site_codeM4  -0.04652    0.25818  -0.180  0.85711    
collection_methodKSL:site_codeM4 -0.04733    0.29227  -0.162  0.87144    
collection_methodCSL:site_codeM4 -0.56047    0.36192  -1.549  0.12226    
collection_methodBT:site_codeM7  -0.62172    0.26583  -2.339  0.01983 *  
collection_methodKSL:site_codeM7 -0.38977    0.29923  -1.303  0.19346    
collection_methodCSL:site_codeM7       NA         NA      NA       NA    
collection_methodBT:site_codeM9  -0.79353    0.25678  -3.090  0.00214 ** 
collection_methodKSL:site_codeM9 -0.27627    0.30595  -0.903  0.36707    
collection_methodCSL:site_codeM9  0.39826    0.26766   1.488  0.13754    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Gamma family taken to be 0.5221921)

    Null deviance: 62031.44  on 422  degrees of freedom
Residual deviance:   196.78  on 407  degrees of freedom
AIC: 4134.1

Number of Fisher Scoring iterations: 6

The next step that I would like to take (based on the SOP that I am following) is to run this command:

> mod_glht <- glht(mod, linfct = mcp(collection_method = c('SW - BT = 0',
+                                                      'SW - KSL = 0',
+                                                      'SW - CSL = 0',
+                                                      'BT - KSL = 0',
+                                                      'BT - CSL = 0',
+                                                      'KSL - CSL = 0'),
+                                    site_code  = c('M0 - M4 = 0',
+                                                   'M0 - M7 = 0',
+                                                   'M0 - M9 = 0',
+                                                   'M4 - M7 = 0',
+                                                   'M4 - M9 = 0',
+                                                   'M7 - M9 = 0'),
+                                    interaction_average = TRUE))

Error in modelparm.default(model, ...) : 
  dimensions of coefficients and covariance matrix don't match
In addition: Warning messages:
1: In mcp2matrix(model, linfct = linfct) :
  covariate interactions found -- default contrast might be inappropriate
2: In mcp2matrix(model, linfct = linfct) :
  covariate interactions found -- default contrast might be inappropriate

I have come to learn that the above error and warning messages arise from the missing (NA) data in the CSL:M7 interaction line from the GLM.

There are a number of suggestions on how to deal with this on stack overflow, none of which have worked for me.

1) I have tried NA.omit after producing the GLM, which does not do anything to the model. It seems that this is only effective for input data (in which no NA's exist) and not useful in editing a GLM output.

2) I have also tried editing the GLM model matrix as outlined here: https://stats.stackexchange.com/questions/141792/how-to-compare-nested-factor-levels-to-mean-not-to-first-level-and-how-to-test

When I do so, it is successful in pulling out the CSL:M7 interaction, but then, I am still unable to run the GLHT on the edited model matrix and get this error:

Error in mcp2matrix2(model, linfct = linfct, interaction_average = ia,  : 
Variable(s) ‘Mcollection_method’‘Msite_code’ have been specified in ‘linfct’ but cannot be found in ‘model’

3) I also tried the update function but it turned out you could only “update” to remove the factors collection_method:site_code, and not categories within the factors – such as collection_methodCSL:site_codeM7.

Any/all suggestions would be so helpful. I am very new to this, so if I am doing something wrong or if there is a clear/obvious fix, please let me know.

If any ideas of how to address the NA in the GLM issue come to mind, please feel free to share.

Thanks in advance!

Alix1193
  • 11
  • 2
  • this question looks a bit polluted. too much context, too many questions in one, no clear problem. (3 errors, one successful output, but that doesn't match your criteria somehow, and many, many acronyms ...). The chance, that there is someone out there, who is not your coworker, who has/knows your circumstances, is rather slim. Can you try and state the question with a clear focus and one concrete problem? – scrimau Jun 10 '20 at 18:21
  • Thanks for letting me know. I initially thought it would be important to include what I have tried so that the same ideas didn't get suggested or if I were doing them wrong, someone could let me know, but now I see it makes it a bit confusing. I tried to remove the unnecessary information and re-posted with my main question highlighted at the top. – Alix1193 Jun 10 '20 at 20:13
  • the root of your problem stems from "dimensions of coefficients and covariance matrix don't match", so i would try and make them match, can you remove the "NA" coefficient and corresponding entries in the covariance matrix, so that the dimensions match again? or you replace the missing coefficient by 0? or you adjust the formula to not include "collection_methodCSL:site_codeM7"? – scrimau Jun 11 '20 at 10:16

0 Answers0