Re-transform a linear model. Case study with R

Question

Let's say I have a response variable which is not normally distributed and an explanatory variable. Let's create these two variables first (coded in R):

set.seed(12)
resp = (rnorm(120)+20)^3.79
expl = rep(c(1,2,3,4),30)

I run a linear model and I realize that the residuals are not normally distributed. (I know running a Shapiro might not be enough to justify that the residuals are not normally distributed but it is not the point of my question)

m1=lm(resp~expl)
shapiro.test(residuals(m1))
0.01794

Therefore I want to transform my explanatory variable (looking for a transformation with a Box-Cox for example).

m2=lm(resp^(1/3.79)~expl)
shapiro.test(residuals(m2))
0.4945

Ok, now my residuals are normally distributed it is fine! I now want to make a graphical representation of my data and my model. But I do not want to plot my explanatory variable in the transformed form because I would lose lots of its intuitive meaning. Therefore I do:

plot(x=expl,y=resp)

What if I now want to add the model? I could do this

abline(m2) # m2 is the model with transformed variable

but of course the line does not fit the data represented. I could do this:

abline(m1) # m1 is the model with the original variable.

but it is not the model I ran for the statistics! How can I re-transform the line predicted by m2 so that it fits the data?

score 2 · Accepted Answer · answered Aug 29 '13 at 09:24

2

plotexpl <- seq(1,4,length.out=10)
predresp <- predict(m2,newdata=list(expl=plotexpl))

lines(plotexpl, predresp^(3.79))

I won't discuss the statistical issues here (e.g. a non-significant test does not mean that H0 is true and your model is not better than the mean).

answered Aug 29 '13 at 09:24

Roland

127,288
10
191
288

Thank you! I don't understand at 100% how it works though. Does it work as well if `expl` is non-linear.? example: `expl = c(c(1,2,32,40),30)`. I guess it should. Does it change anything to the predicted line if I use 3 points or 100 points (in other words, if `length.out` equals 3 or 100)? – Remi.b Aug 29 '13 at 09:45
This code uses `m2` to get predicted values for new `expl` values and then transforms them back to the original scale. The higher the number of predicted data points, the smoother the line looks (as `lines` interpolates between points). Whether the distribution of `expl` is uniform or not (I guess that's what you mean with "non-linear") doesn't matter for plotting. – Roland Aug 29 '13 at 11:55

score 0 · Answer 2 · answered Jun 19 '17 at 17:10

Since you've mentioned that the transformation might base on Box-Cox formula, I would like to point out a issue you might want to consider.

According to the Box-cox transformation formula in the paper Box,George E. P.; Cox,D.R.(1964). "An analysis of transformations", your transformation implementation (in case it is a Box-Cox one) might need to be slightly edited.The transformed y should be (y^(lambda)-1)/lambda instead of y^(lambda). (Actually, y^(lambda) is called Tukey transformation, which is another distinct transformation formula.)
So, the code should be:

lambda=3.79
m2=lm(resp^((lambda-1)/lambda)~expl)
shapiro.test(residuals(m2))

More information

Correct implementation of Box-Cox transformation formula by boxcox() in R:
https://www.r-bloggers.com/on-box-cox-transform-in-regression-models/
A great comparison between Box-Cox transformation and Tukey transformation. http://onlinestatbook.com/2/transformations/box-cox.html
One could also find the Box-Cox transformation formula on Wikipedia: en.wikipedia.org/wiki/Power_transform#Box.E2.80.93Cox_transformation

Please correct me if I misunderstood your implementation.

Re-transform a linear model. Case study with R

2 Answers2

More information