When I conducted data analysis, I encountered an serious problem.
I want to research how a Treatment intensity would influence response variable y. The field investigation was conduced at different SITES on two years in long time ago. According to my knowledge, it’s not so good to group_by SITES
and YEAR
to acquire mean or sum value of y, especially excluding sum due to several observations were lost at different SITES
.
Then I tried to build a negative binomial model with glmmTMB
and check with DHARMa
package: I got not so good residuals pattern.
library(glmmTMB)
model <- glmmTMB::glmmTMB(y~intensity+YEAR+(1|SITES),data = f,family = 'nbinom2')
res = simulateResiduals(model)
res %>% plot()
And I tried to check whether it violates assumption of spatial auto-correlation with DHARMa
too. It shows significant violation to the assumption( p-value = 0.0083). So, I tried to resolve the problem.
res2 <- recalculateResiduals(res, group = f$SITES)
testSpatialAutocorrelation(res2,
x = f %>% group_by(SITES) %>% summarise(lat=mean(lat)) %>% pull(lat),
y = f %>% group_by(SITES) %>% summarise(long=mean(long)) %>% pull(long)
)
DHARMa Moran's I test for distance-based autocorrelation
data: res2
observed = 0.1135, expected = -0.0714, sd = 0.0701, p-value = 0.0083
alternative hypothesis: Distance-based autocorrelation
Because a model creating through the glmmTMB
allowed to add coordinates to variance structure. So, I tried to add coordinates to above model with 3 methods as below:
mat()
gau()
exp()
temp$pos <- numFactor(f$long,f$lat) # coordinate
# mat
FormA1 = formula(y ~ intensity + YEAR + mat(pos + 0 | SITES))
FormA2 = formula(y ~ intensity + YEAR + (1|SITES) + mat(pos + 0 | SITES))
# gau
FormB1 = formula(y ~ intensity + YEAR + gau(pos + 0 | SITES))
FormA2 = formula(y ~ intensity + YEAR + (1|SITES) + gau(pos + 0 | SITES))
# exp
FormA1 = formula(y ~ intensity + YEAR + exp(pos + 0 | SITES))
FormB2 = formula(y ~ intensity + YEAR + (1|SITES) + exp(pos + 0 | SITES))
model <- glmmTMB::glmmTMB(FormA1, #FormA,FormB,FormC I tried one by one.
data = f,
family = 'nbinom2')
But the results were there’re only exp()
variance structure could be running. The variance structures of gau()
and mat()
would tell the warnings:
Warning in fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
Warning in fitTMB(TMBStruc) :
Model convergence problem; false convergence (8). See vignette('troubleshooting')
Even though the exp()
could be running, when I tried to check spatial correlation, there was also significant violation to the assumption(p-value = 0.0087).
So, I gave up the method that I add coordinates to a variance structure of glmmTMB
and turned to glmmPQL
. Because , it could give correlation variance structure
for MASS::glmmPQL
and similar to gls()
and lme()
in nlme
package.
The correlation structure are:
corSpher
corSpatial
corGau
corExp
corRatio
corLin
However, when I tried to run the code as below:
model <- MASS::glmmPQL(y ~ intensity, random = ~ 1|SITES,family = negative.binomial(theta = 0.9),
correlation = corExp(form = ~ long+lat|SITES), #corExp,corGau, corSpher, corLin ,corRatio, all I tried
data = f)
But they always show error: Error in getCovariate.corSpatial(object, data = data):cannot have zero distances in "corSpatial"
.
Considering my field investigation conditions, someone must consider to group my data and to obtain sum or mean of y, but the problems are:
- group
YEAR
andSITES
to sum of y must yield bias error due to different observations within groups. - It might be suitable when group by to get mean of y, but I don’t want to narrow the number of observations..
I have looked for a lot of posts in stackoverflow
and other websites about how to resolve the problem: cannot have zero distances in "corSpatial"
. But I didn’t get good resolution including someone said: The barrier is repeated coordinates in the data. Sure, he said right, but I had put lon + lat|SITES
at correlation structure, it also showed cannot have zero distances in "corSpatial"
. So, I realized I must look for help from Internet.
So, the question I want to say is : According to my data, how to create a good mixed linear model not to violation to residuals pattern and spatial auto-correlation.
Of course , you could use other models but better not to use Bayesians because it’s really strange for me.
If you really want to help me , I wish you might post your code so that I can run it again rather than type a little sentences in here. I’ve upload part of my original data so that support convenience to you . I really really want to get helps from all of friends on internet.
Thanks in advance at 3 o’clock on mid-night on my time zone.
Here is my data:
f =
structure(list(YEAR = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2000",
"2001"), class = "factor"), SITES = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L,
9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 10L, 10L, 10L,
10L, 10L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 11L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 11L, 14L, 14L, 14L, 14L, 14L, 14L,
14L, 14L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
15L, 15L, 15L, 15L, 15L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L), .Label = c("A1", "A10", "A11", "A12", "A17",
"A18", "A20", "A3", "A8", "A9", "B1", "B14", "B16", "B3", "B6"
), class = "factor"), long = c(80.41081194, 80.41081194, 80.41081194,
80.41081194, 80.41081194, 80.41081194, 80.41081194, 80.41081194,
80.41081194, 80.48462682, 80.48462682, 80.48462682, 80.48462682,
80.48462682, 80.48462682, 80.48462682, 80.48462682, 80.48462682,
80.57384333, 80.57384333, 80.57384333, 80.57384333, 80.57384333,
80.57384333, 80.57384333, 80.57384333, 80.57384333, 80.48125,
80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 80.48125, 80.48125,
80.48125, 80.56281944, 80.56281944, 80.56281944, 80.56281944,
80.56281944, 80.56281944, 80.56281944, 80.56281944, 80.56281944,
80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 80.46875, 80.46875,
80.46875, 80.46875, 80.41125278, 80.41125278, 80.41125278, 80.41125278,
80.41125278, 80.41125278, 80.41125278, 80.41125278, 80.41125278,
80.30014935, 80.30014935, 80.30014935, 80.30014935, 80.30014935,
80.30014935, 80.30014935, 80.30014935, 80.30014935, 80.26167842,
80.26167842, 80.26167842, 80.26167842, 80.26167842, 80.26167842,
80.26167842, 80.26167842, 80.26167842, 80.20558138, 80.20558138,
80.20558138, 80.20558138, 80.20558138, 80.20558138, 80.20558138,
80.20558138, 80.20558138, 80.349776, 80.349776, 80.349776, 80.349776,
80.349776, 80.349776, 80.349776, 80.349776, 80.349776, 80.349776,
80.349776, 80.349776, 80.373821, 80.373821, 80.373821, 80.373821,
80.373821, 80.373821, 80.373821, 80.373821, 80.373821, 80.373821,
80.373821, 80.373821, 80.434011, 80.434011, 80.434011, 80.434011,
80.434011, 80.434011, 80.434011, 80.434011, 80.434011, 80.434011,
80.434011, 80.434011, 80.520748, 80.520748, 80.520748, 80.520748,
80.520748, 80.520748, 80.520748, 80.520748, 80.520748, 80.520748,
80.520748, 80.520748, 80.507795, 80.507795, 80.507795, 80.507795,
80.507795, 80.507795, 80.507795, 80.507795, 80.507795, 80.507795,
80.507795, 80.507795), lat = c(41.17633167, 41.17633167, 41.17633167,
41.17633167, 41.17633167, 41.17633167, 41.17633167, 41.17633167,
41.17633167, 41.12161667, 41.12161667, 41.12161667, 41.12161667,
41.12161667, 41.12161667, 41.12161667, 41.12161667, 41.12161667,
40.97858194, 40.97858194, 40.97858194, 40.97858194, 40.97858194,
40.97858194, 40.97858194, 40.97858194, 40.97858194, 41.03570917,
41.03570917, 41.03570917, 41.03570917, 41.03570917, 41.03570917,
41.03570917, 41.03570917, 41.03570917, 40.80911056, 40.80911056,
40.80911056, 40.80911056, 40.80911056, 40.80911056, 40.80911056,
40.80911056, 40.80911056, 40.80239444, 40.80239444, 40.80239444,
40.80239444, 40.80239444, 40.80239444, 40.80239444, 40.80239444,
40.80239444, 40.82870833, 40.82870833, 40.82870833, 40.82870833,
40.82870833, 40.82870833, 40.82870833, 40.82870833, 40.82870833,
41.02665248, 41.02665248, 41.02665248, 41.02665248, 41.02665248,
41.02665248, 41.02665248, 41.02665248, 41.02665248, 40.99086151,
40.99086151, 40.99086151, 40.99086151, 40.99086151, 40.99086151,
40.99086151, 40.99086151, 40.99086151, 41.00486149, 41.00486149,
41.00486149, 41.00486149, 41.00486149, 41.00486149, 41.00486149,
41.00486149, 41.00486149, 41.021529, 41.021529, 41.021529, 41.021529,
41.021529, 41.021529, 41.021529, 41.021529, 41.021529, 41.021529,
41.021529, 41.021529, 40.970129, 40.970129, 40.970129, 40.970129,
40.970129, 40.970129, 40.970129, 40.970129, 40.970129, 40.970129,
40.970129, 40.970129, 40.85327, 40.85327, 40.85327, 40.85327,
40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 40.85327, 40.85327,
40.85327, 41.045005, 41.045005, 41.045005, 41.045005, 41.045005,
41.045005, 41.045005, 41.045005, 41.045005, 41.045005, 41.045005,
41.045005, 41.083288, 41.083288, 41.083288, 41.083288, 41.083288,
41.083288, 41.083288, 41.083288, 41.083288, 41.083288, 41.083288,
41.083288), y = c(36L, 39L, 45L, 167L, 150L, 113L, 74L, 80L,
72L, 197L, 249L, 212L, 195L, 173L, 105L, 43L, 44L, 27L, 80L,
69L, 52L, 306L, 308L, 188L, 123L, 106L, 121L, 27L, 32L, 26L,
197L, 232L, 245L, 114L, 110L, 170L, 12L, 46L, 57L, 254L, 332L,
207L, 33L, 95L, 89L, 40L, 81L, 77L, 119L, 122L, 194L, 81L, 54L,
71L, 34L, 29L, 33L, 136L, 129L, 189L, 42L, 38L, 69L, 53L, 39L,
55L, 62L, 33L, 49L, 26L, 15L, 31L, 130L, 144L, 79L, 76L, 41L,
86L, 21L, 25L, 16L, 104L, 115L, 109L, 69L, 98L, 84L, 26L, 19L,
30L, 10L, 12L, 4L, 13L, 9L, 4L, 27L, 0L, 4L, 6L, 2L, 6L, 3L,
1L, 0L, 1L, 1L, 1L, 6L, 3L, 5L, 0L, 5L, 2L, 0L, 1L, 2L, 2L, 0L,
2L, 1L, 4L, 1L, 10L, 6L, 9L, 2L, 2L, 2L, 0L, 5L, 0L, 17L, 7L,
20L, 33L, 34L, 45L, 1L, 2L, 0L, 2L, 2L, 2L, 5L, 6L, 3L, 12L,
22L, 12L), intensity = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("low",
"high"), class = "factor")), class = "data.frame", row.names = c(NA,
-150L))