1

I am doing a linear regression with data that needs transformation, for it, I am using a Box-Cox power transformation, followed by back-transformation to write a report using the original scale. I've been trying to do this with the emmeans packages, and I followed the steps described in the emmeans package vignette, however, I find that the summary results for the estimated means are not at all similar to the untransformed data. In fact, the output is not transformed at all.

Here is a reproducible example using the examples from the emmeans package:

require(emmeans)

# Fit a model using an oddball transformation:
bctran <- make.tran("boxcox", 0.368)
warp.bc <- with(bctran, 
                lm(linkfun(breaks) ~ wool * tension, data = warpbreaks))
# Obtain back-transformed LS means:    
emmeans(warp.bc, ~ tension | wool, type = "response")

# Fit a model without transformation:
warp <- lm(breaks ~ wool * tension, data = warpbreaks)

# Obtain LS means:
emmeans(warp, ~ tension | wool)

which returns:

> emmeans(warp.bc, ~ tension | wool, type = "response")
wool = A:
 tension emmean    SE df lower.CL upper.CL
 L         8.07 0.419 48     7.23     8.92
 M         5.91 0.419 48     5.07     6.75
 H         5.94 0.419 48     5.10     6.79

wool = B:
 tension emmean    SE df lower.CL upper.CL
 L         6.45 0.419 48     5.61     7.29
 M         6.53 0.419 48     5.69     7.37
 H         5.22 0.419 48     4.38     6.07

Confidence level used: 0.95 
> emmeans(warp, ~ tension | wool)
wool = A:
 tension emmean   SE df lower.CL upper.CL
 L         44.6 3.65 48     37.2     51.9
 M         24.0 3.65 48     16.7     31.3
 H         24.6 3.65 48     17.2     31.9

wool = B:
 tension emmean   SE df lower.CL upper.CL
 L         28.2 3.65 48     20.9     35.6
 M         28.8 3.65 48     21.4     36.1
 H         18.8 3.65 48     11.4     26.1

Confidence level used: 0.95 

when in fact the estimated mean for tension:L should be 42.37, as calculated using the formula:

> origin + (1 + param * pmax(eta))^(1/param)


> 0 + (1 + 0.368 * pmax(8.07))^(1/0.368)
[1] 42.37179

Is there something I am missing or not understanding properly?

camille
  • 16,432
  • 18
  • 38
  • 60
andapo
  • 21
  • 4

1 Answers1

0

Hmmmm. I reproduced this problem. I'm not sure what's wrong, but so far I can tell that bctran itself is in order:

> emm = as.data.frame(emmeans(warp.bc, ~tension|wool))
> emm
  tension wool   emmean        SE df lower.CL upper.CL
1       L    A 8.074761 0.4192815 48 7.231739 8.917783
2       M    A 5.911710 0.4192815 48 5.068688 6.754732
3       H    A 5.942335 0.4192815 48 5.099313 6.785357
4       L    B 6.449869 0.4192815 48 5.606847 7.292891
5       M    B 6.531085 0.4192815 48 5.688063 7.374107
6       H    B 5.224939 0.4192815 48 4.381917 6.067961

> bctran$linkinv(emm$emmean)
[1] 42.42263 23.10060 23.32407 27.22827 27.88877 18.43951

So these back-transformed EMMs are in-order. I'll trace the code and see why the results aren't back-transformed.

Update

I found a logic error from a revision a few months ago whereby if a transformation is character (e.g., "log") it works fine, but if it is a list (e.g., your bctran) it is ignored. I fixed that error in the next version to push to the github site (version >= 1.3.3.0999902), and the fix will be in the next CRAN update (version > 1.3.3).

> emmeans(warp.bc, ~ tension | wool)
wool = A:
 tension emmean    SE df lower.CL upper.CL
 L         8.07 0.419 48     7.23     8.92
 M         5.91 0.419 48     5.07     6.75
 H         5.94 0.419 48     5.10     6.79

wool = B:
 tension emmean    SE df lower.CL upper.CL
 L         6.45 0.419 48     5.61     7.29
 M         6.53 0.419 48     5.69     7.37
 H         5.22 0.419 48     4.38     6.07

Results are given on the Box-Cox (lambda = 0.368) (not the response) scale. 
Confidence level used: 0.95 

> emmeans(warp.bc, ~ tension | wool, type = "response")
wool = A:
 tension response   SE df lower.CL upper.CL
 L           42.4 4.48 48     34.0     52.0
 M           23.1 3.05 48     17.5     29.8
 H           23.3 3.07 48     17.7     30.0

wool = B:
 tension response   SE df lower.CL upper.CL
 L           27.2 3.38 48     20.9     34.6
 M           27.9 3.44 48     21.5     35.3
 H           18.4 2.65 48     13.6     24.3

Confidence level used: 0.95 
Intervals are back-transformed from the Box-Cox (lambda = 0.368) scale 

Notice that even without back-transforming, there is an annotation of that fact. The fact that no annotation at all is present in your results was a tip-off.

Russ Lenth
  • 5,922
  • 2
  • 13
  • 21
  • Thanks for the quick reply! I didn't notice the annotation, should read more carefully next time, but I am glad you fixed this issue. Thanks again! – andapo Mar 10 '19 at 01:22
  • I didn’t expect you to notice the annotation. I just meant that’s what tipped *me* off. Sorry I gave the wrong impression. I’m glad you noticed the bug and posted this. – Russ Lenth Mar 10 '19 at 02:55
  • It is alright @rvi, I didn't take offence, I was reflecting on the fact that it is a clear clue of something going wrong. In fact, knowing to look for that annotation shows that the emmeans() function is not returning inverse transformed estimates and I playing around I am getting an error if I run the following code: warp.glm <- glm(sqrt(breaks) ~ wool*tension, family = Gamma, data = warpbreaks) emmeans(warp.glm, ~ tension | wool, type = "response") Error in link$mu.eta(object@bhat[estble]) : attempt to apply non-function – andapo Mar 10 '19 at 06:26
  • Yikes! I had a `!=` in the code where it should have been `==`. Now your glm example works. Shows what happens with a hasty update... – Russ Lenth Mar 10 '19 at 17:20