1

i am having a weird problem running a hierarchical (mixed) model with nlme package. When adding heterogeneus variance along a factor, the residuals become flat in one of them.

Here is the code:

Creating data frame

x <- data.frame("time"    = factor(c(1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3)), 
                "rep"     =        c(1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3,1,1,1,1,2,2,2,2,3,3,3,3),
                "varB"    = factor(c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2)), 
                "site"    = factor(c(1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2)),
                "varA"    = factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)),
                "subsite" = factor(c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2)),
                "response"= c(0.657, 0.634, 0.723, 0.410, 0.649, 0.820, 0.382, 0.394, 0.586, 0.572, 0.603,
                              0.441, 0.563, 0.514, 1.120, 1.011, 0.449, 0.822, 0.209, 0.559, 0.440, 0.374, 
                              0.511, 0.640, 0.599, 0.245, 0.582, 0.186, 0.407, 0.271, 0.351, 0.202, 0.154, 
                              0.328, 0.220, 0.233))

Running the hierarchical model (homoscedastic)

library(nlme)
m1 <- lme(response ~ (varA+time+varB)^2, 
            random = ~1|rep/site/subsite, 
            data = x, 
            na.action = na.omit, 
            method = "REML")

Checking residuals of homoscedastic model

plot(m1, resid(., type = "p") ~ fitted(.) | time, 
     abline = c(-2,0,2), main = "Residuals by time", id=0.05)

residuals by time

As it can be seen in the figure, there is a heteroscedastic structure along time in the residuals so I decided to model it (greater dispersion in time 2, than in the two others)

Creating the heteroscedastic model

m2 <- update(m1, weights = varIdent(form = ~1|time), control = lmeControl(niterEM = 1000))

Comparing models

anova(m1,m2)

   Model df      AIC      BIC   logLik   Test  L.Ratio p-value
m1     1 14 25.06879 42.68214 1.465604                        
m2     2 16 20.96663 41.09617 5.516685 1 vs 2 8.102163  0.0174

Model 2 (heteroscedastic) fits better than 1

The problem becomes when I check the residuals of the new model:

plot(m2, resid(., type = "p") ~ fitted(.) | time, 
     abline = c(-2,0,2), main = "Residuals by time", id=0.05)

enter image description here

As it can be seen in the figure, the residuals of time 1 became flat... What could be the problem??

I am not sure if it helps, but perhaps checking the variance structure of model 2 could be a clue to find out the problem:

summary(m2$modelStruct$varStruct)

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | time 
 Parameter estimates:
       1        2        3 
   1.000 4557.519 3274.876 
Facu
  • 101
  • 6
  • I don't see an issue with that first figure. My general rule of thumb is that if you can remove a single point and it removes the perceived issue then the issue might not have been too severe in the first place. – Dason Jan 10 '20 at 00:59
  • Thanks for your response @Dason. This is only an example. I am running models with several variables and sometimes the residuals of the homoscedastic model are worse than this example. I have only 3 replications, so removing data is the last option for me. I would like help with the issue I asked for... – Facu Jan 10 '20 at 01:08
  • I wasn't suggesting literally removing the values from the data. Just that if you removed one (like in your mind) and it changes completely how you see the data then maybe there isn't as big of an issue as you think. In you example there is one value that if I ignore it doesn't appear to be any issue with heteroskedacity. I'm not saying go ahead and remove that value from the data for the analysis. Just that if you ignore it then the problem isn't prevalent so maybe the problem isn't actually a problem... – Dason Jan 10 '20 at 01:22
  • Thanks @Dason. What about the problem I asked for? Could you help me with that? I am not sure if it is a bug of the package or I need to set another argument of lmeControl... – Facu Jan 10 '20 at 01:30
  • This is something that has cropped up periodically for me over the years with `varIdent()`. I've never been able to predict it will happen nor bypass it other than reducing the number of groups in the variance structure. I've also played around with `lmeControl()` options, including changing the optimizer, but have never found anything that helps. – aosmith Jul 31 '20 at 14:25

0 Answers0