1

I came across strange results when using lme with repeated measures and varIdent. Any help with this will be very appreciated!

I am testing whether the 13C signal of leaves along a time series differs between 2 species (A and B). I am basically interested on the overall differences between species, not in specific time points.

Here is my dataset:

Block  Species  time    X13C
1   B   2   0.775040865
2   B   2   0.343913792
3   B   2   0.381053614
1   A   2   0.427101597
2   A   2   0.097743662
3   A   2   0.748345826
1   B   24  0.416700446
2   B   24  0.230773558
3   B   24  0.681386484
1   A   24  0.334026511
2   A   24  0.866426406
3   A   24  0.606346215
1   B   48  0.263085491
2   B   48  0.083323709
3   B   48  0.534697801
1   A   48  0.30594443
2   A   48  0.024555489
3   A   48  0.790670392
1   B   96  0.158090804
2   B   96  0.254880689
3   B   96  0.082666799
1   A   96  0.139189281
2   A   96  0.300340119
3   A   96  0.233149535
1   B   192 0.055421148
2   B   192 0.082582155
3   B   192 0.136636735
1   A   192 0.03641637
2   A   192 0.06082544
3   A   192 0.126029308

I am applying the following model:

bulk<-lme(X13Catex ~ Species*time, random = ~1|Block/Species, method='REML', na.action=na.exclude, data=VacL, corAR1())

As there is heterogeneity in the residuals for time, I applied varIdent, which improved the model fits (AIC). The normalised residuals plots also looked good.

bulk.var<-lme(X13Catex ~ Species*time, random = ~1|Block/Species, method='REML', na.action=na.exclude, data=VacL, corAR1(), weights=varIdent(form=~1|time))

The thing is that with this code I get a significant p-value for Species, but looking at my data it does not seem that species differ at all... I think it is very strange to get such a low p-value, as error bars overlap at each time point and also at some time points A is greater than B and at some others the other way round.

> anova(bulk.var)
                         numDF denDF  F-value p-value
(Intercept)                  1    15 13.25772  0.0024
SpeciesCode                  1     2 67.08281  0.0146
SamplingTime                 4    15  4.42320  0.0147
SpeciesCode:SamplingTime     4    15  1.27659  0.3227

It happened again when I analysed other similar variables...

I wonder if a problem could be the low replication for each species at each sampling time (n = 3). Could it be that applying varIdent and a "relatively complex" model with such a low amount of replicates explains the significant p-value found? Any suggestions on how to deal with this?

Thank you!!

Alba
  • 11
  • 1
  • 3

1 Answers1

1

OK let me try.

First of all, your correlation structure does not seem correct to me. You need to use the time covariate there.

fit0 <- lme(X13C ~ Species*time, random = ~1|Block/Species, method='REML', 
            na.action=na.exclude, data=VacL,
            corAR1(0.9, form = ~ time | Block/Species))
summary(fit0)

Then the variance of the nested random effect seems to be rather small. Let's try to remove this parameter.

fit1 <- lme(X13C ~ Species*time, random = ~1|Block, method='REML', 
            na.action=na.exclude, data=VacL,
            corAR1(0.9, form = ~ time | Block/Species))
summary(fit1)

anova(fit0, fit1)
#     Model df      AIC     BIC    logLik   Test     L.Ratio p-value
#fit0     1  8 35.47003 45.5348 -9.735014                           
#fit1     2  7 33.47003 42.2767 -9.735014 1 vs 2 8.37192e-09  0.9999

plot(fit1)

plot 1

The plot indeed shows strong heterogeneity. At this point I would seriously consider a GLMM. Remember, delta 13 C is the (transformed) fraction 13C/12C. A normality assumption seems a bit dubious for this (although I occasionally use it for delta values myself). However, it appears to me like we can model the variance in dependence of the fitted values.

fit2 <- lme(X13C ~ Species*time, random = ~1|Block, method='REML', 
            na.action=na.exclude, data=VacL,
            corAR1(0.9, form = ~ time | Block/Species),
            weights = varPower())

plot(fit2, resid(., type = "normalized") ~ fitted(.))

plot2

anova(fit1, fit2)
#     Model df      AIC      BIC    logLik   Test  L.Ratio p-value
#fit1     1  7 33.47003 42.27670 -9.735014                        
#fit2     2  8 11.34319 21.40796  2.328405 1 vs 2 24.12684  <.0001

Not too bad. Let's check the p-values.

coef(summary(fit2))
#                      Value    Std.Error DF    t-value      p-value
#(Intercept)    0.3906798322 0.0640391495 24  6.1006405 2.661703e-06
#SpeciesB      -0.0303078937 0.0777180616 24 -0.3899723 6.999965e-01
#time          -0.0016541839 0.0003059863 24 -5.4060727 1.491893e-05
#SpeciesB:time  0.0002578782 0.0004048368 24  0.6369930 5.301592e-01

Neither the intercepts nor the slopes are significantly different. Now, let's try the ANOVA.

anova(fit2)
             numDF denDF  F-value p-value
(Intercept)      1    24   9.0061  0.0062
Species          1    24 525.7457  <.0001
time             1    24  56.5135  <.0001
Species:time     1    24   0.4058  0.5302

For comparison without variance structure:

anova(fit1)
             numDF denDF   F-value p-value
(Intercept)      1    24 29.536428  <.0001
Species          1    24  0.319802  0.5770
time             1    24 17.602173  0.0003
Species:time     1    24  0.041482  0.8403

So, you get the same problem with a model that uses four parameters less. And now I don't know why the species effect is significant in the sequential ANOVA if a variance structure is included although the corresponding model parameter is not significant. You could study Pinheiro & Bates 2000 and try to find out yourself or ask on Cross Validated.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • Yes, it seems the problem is still there... I did not try with varPower before, I wonder whether it's more suitable than varIdent here... Thanks anyway! – Alba Sep 26 '16 at 08:48