2

I have a question regarding the degrees of freedom that are used by the lsmeans package in the case of a linear mixed model built with the nlme package.

Here is an example to illustrate my question based on the Oats dataset. I am not trying to discuss if this model is relevant given the dataset, I am just trying to reproduce an issue I had with another dataset ;-).

Oats.lme <- lme(yield ~ Variety, random = ~1 | Block, data = Oats)
anova(Oats.lme)

With the anova, I obtain the expected 64 degrees of freedom.

numDF denDF  F-value p-value
(Intercept)     1    64 245.1409  <.0001
Variety         2    64   1.6654  0.1972

Then I use the lsmeans function:

lsmeans(Oats.lme, list(poly ~ Variety))

and I obtain

$`lsmeans of Variety`
Variety       lsmean       SE df lower.CL upper.CL
Golden Rain 104.5000 7.680866  5 84.75571 124.2443
Marvellous  109.7917 7.680866  5 90.04737 129.5360
Victory      97.6250 7.680866  5 77.88071 117.3693

Confidence level used: 0.95

$`polynomial contrasts of contrast`
contrast   estimate       SE df t.ratio p.value
linear     -6.87500  6.68529 64  -1.028  0.3076
quadratic -17.45833 11.57926 64  -1.508  0.1365

For the contrasts, I obtain the same 64 df, but for the lsmeans themselves, I have only 5 df. I also use SAS, and for the same kind of models, I have the same number of df for both lsmeans and contrasts (which would be 64 with the current example).

I have seen that it might be possible to change degrees of freedom when using the lme4 package, but my code is embedded in an internally-developed tool that is based on nlme, so I am basically stuck with nlme.

Would anyone now why this happens and if this is possible to change it? Or am I missing something?

Update - Initial error message

I initially noticed these reduced degrees of freedom for the lsmeans in one specific case, where my random run effect only had 2 levels, and when I was interested in a Dunnett’s adjustment. Since I am more interested in contrasts than in lsmeans, now that I understood where it comes from, I can still work with it, but I put it there just in case someone has the same error and wants to know why.

I reproduced it below with the Oats data example. The error I obtain happens in the lsmeans:::.qdunnx function and is due to the df for the lsmeans being at 1.

Oats.lme <- lme(yield ~ Variety, random = ~1 | Block, data = subset(Oats,Block %in% c("I","II")))
lsm <- lsmeans(Oats.lme, trt.vs.ctrl ~ Variety)
summary(lsm,adjust = "dunnettx", infer = c(T, T), level = 0.95)

And here is the result

$lsmeans
Variety      lsmean       SE df  lower.CL upper.CL
Golden Rain 123.250 15.88642  1 -78.60608 325.1061
Marvellous  125.500 15.88642  1 -76.35608 327.3561
Victory     115.125 15.88642  1 -86.73108 316.9811
Confidence level used: 0.95

$contrasts
contrast                 estimate      SE df t.ratio p.value
Marvellous - Golden Rain    2.250 12.8697 20   0.175  0.9695
Victory - Golden Rain      -8.125 12.8697 20  -0.631  0.7482
P value adjustment: dunnettx method for 2 tests

Error in if (abs(diff(r[1:2])) < 5e-04) return(r[1]) : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In qtukey(p, (1 + sqrt(1 + 8 * k))/2, df) : production de NaN
Aurélie
  • 75
  • 2
  • 7
  • Not sure, but, oddly, if you take the intercept out of the `lme` model you get 64 df for the group means. – aosmith Mar 15 '17 at 17:44
  • You're right, I don't understand why. – Aurélie Mar 16 '17 at 07:50
  • By the way, polynomial contrasts are really inappropriate here, because `Variety` does not have equally-spaced quantitative levels. Specifying `pairwise ~ Variety` makes more sense, and the df will still be 64, – Russ Lenth Mar 16 '17 at 15:42
  • Yes, I just adapted the example in the help of the lsmeans function without thinking too much about what would be the right contrast. – Aurélie Mar 16 '17 at 16:10
  • 1
    Ah - R doesn't like the Tukey distribution when there is only 1 df. I suggest using "mvt" adjustment. – Russ Lenth Mar 17 '17 at 13:05

1 Answers1

2

The model says the response variable is subject to two random variations: those due to blocks, and those due to varieties. The means for each variety include both those sources of variation; but comparisons of those means exclude the block variations because the varieties are compared on the same block.

You have only six blocks so there are 5 degrees of freedom for estimating the variations of blocks, and that explains the degrees of freedom for the variety means. There are more degrees of freedom for the comparisons, because you do not have to account for the block variations.

Another matter to consider here is that the support for the nlme package uses a containment method for obtaining degrees of freedom. That essentially involves looking at the worst case scenario for degrees of freedom for each effect. If you instead use the lme4 package and the lmer function to fit the model, lsmeans will use a Satterthwaite or Kendall-Roger method to obtain degrees of freedom, and those results might be somewhat larger. However, the degrees of freedom for the means will still be considerably less than those for the comparisons.

Addendum: SAS results

Here is some SAS code with the same data and model:

proc mixed data = Oats;
  class Variety Block;
  model yield = Variety / ddfm = satterth;
  random Block;
  lsmeans Variety / tdiff;

... and the lsmeans results:

                           Least Squares Means

                                   Standard
Effect     Variety     Estimate       Error      DF    t Value    Pr > |t|
Variety    Golden_R      104.50      7.6809    8.87      13.61      <.0001
Variety    Marvello      109.79      7.6809    8.87      14.29      <.0001
Variety    Victory      97.6250      7.6809    8.87      12.71      <.0001

                      Differences of Least Squares Means
                                           Standard
Effect    Variety    _Variety   Estimate      Error     DF   t Value   Pr > |t|

Variety   Golden_R   Marvello    -5.2917     6.6853     64     -0.79     0.4316
Variety   Golden_R   Victory      6.8750     6.6853     64      1.03     0.3076
Variety   Marvello   Victory     12.1667     6.6853     64      1.82     0.0734

Note that SAS shows 64 df for the comparisons but only 8.87 df for the means themselves, when the Satterthwaite method is used for degrees of freedom.

If one omits the ddfm option in the model statement, it defaults to the containment method for df, and it lists 64 df in both tables. However, I believe that SAS is incorrect in its implementation of containment; see my earlier post on this topic in CrossValidated: https://stats.stackexchange.com/questions/140156/degrees-of-freedom-using-containment-method

Community
  • 1
  • 1
Russ Lenth
  • 5,922
  • 2
  • 13
  • 21
  • Thank you for your answer. I am however still a little puzzled. In my example, it seems to me that I have one fixed effect (Variety), and only one random effect (Block), which for me is crossed with Variety. Your explanation about why the block variations would be taken into account for the lsmeans and not for the comparisons make sense, but then why is it that removing the intercept results in the same degrees of freedom for the lsmeans and the comparisons as aosmith found out? – Aurélie Mar 16 '17 at 07:51
  • In addition, in SAS, with the same model I have the same degrees of freedom for both lsmeans and contrasts (df=64). And if I specify that Block should be nested inside of Variety, then I have 5 degrees of freedom for anova, lsmeans and contrasts of lsmeans. Is SAS doing something wrong here ? – Aurélie Mar 16 '17 at 07:56
  • I realized that maybe I am not clear about my issue because of the field example that I chose. In my real dataset, I have 3 treatment groups, that were tested in vivo in 3 different runs of experiments. I want to take the variability related to run into account in my model and so I am using a model with one fixed effect (Treatment), and one random effect (Run) which is then crossed with Treatment. – Aurélie Mar 16 '17 at 08:06
  • I think the example you give is fine for illustrating the concern. Having blocks nested in variety is an entirely different model (and a wrong one) so of course the results are different. Funny you should mention SAS, as I posted kind of the opposite of your question on CrossValdated -- see http://stats.stackexchange.com/questions/140156/degrees-of-freedom-using-containment-method. And try adding `ddfm = satterth` to your SAS code and see what happens. – Russ Lenth Mar 16 '17 at 13:37
  • PS I don't know right away what's happening in the no-intercept model. I'll think about it. – Russ Lenth Mar 16 '17 at 13:39
  • I added some SAS information to the answer. And I figured out what happens with the no-intercept situation. Looking at the code for `lsmeans:::lsm.basis.lme`, one sees that there is code to compensate for an error in the intercept part of the `FixDF$X` member of an `lme` object. It turns out that that workaround does not produce correct results when there is no intercept, and in fact, it isn't possible to get all the correct df in that case. – Russ Lenth Mar 16 '17 at 15:38
  • Thanks a lot. I will look more closely at all these new information tomorrow :-). – Aurélie Mar 16 '17 at 16:11
  • Thanks again for your help and thank you for the investigations in SAS. Yes, your previous post is exactly the opposite question of mine, this is funny. I am not strong enough in theory to comment, but what you explained make sense. I also updated my post with the initial issue that prompted this question. Independently of the exactitude of the df, my code was crashing. – Aurélie Mar 17 '17 at 11:09