How to interpret transformed independent and dependent variables in summary(lm)?

Question

Call:                   
lm(formula = GROWTH ~ log(X1) + log(X2) + log(X3) + log(X4) +                   
    log(X5) + log(1 +X6) + log(1 + X7) +                    
    log(X8) + log(X9) + log(X10) + log(X11) +                   
    log(X12) + log(X13) + X14 + X14:X9 +                    
    X14:X10                 
    data = Data)                    

Residuals:                  
Min 1Q  Median  3Q  Max 
-3.04237    -0.31965    0.05351 0.36639 2.52087 

Coefficients:                   
                    Estimate    Std. Error  t value Pr(>|t|)    
(Intercept)         2.837487    9.543146    0.297   0.766217    
log(X1)             0.377957    0.008647    43.71   < 2e-16 ***
log(X2)             0.363631    0.008906    40.829  < 2e-16 ***
log(X3)             0.337246    0.024202    13.934  < 2e-16 ***
log(X4)            -0.19371     0.029786   -6.503   8.11E-11    ***
log(X5)             0.01227     0.00437     2.808   0.004995    **
log(1 + X6)         0.006533    0.036977    0.177   0.859759    
log(1 + X7)         0.426738    0.191617    2.227   0.02596 *
log(X8)            -0.020741    0.009424    -2.201  0.027759    *
log(X9)             11.303514   2.745818    -4.117  3.87E-05    ***
log(X10)           -7.466939    0.814056    -9.173  < 2e-16 ***
log(X11)           -0.004444    0.00885    -0.502   0.615567    
log(X13)            0.067205    0.010626    6.325   2.61E-10    ***
log(X12)            1.711401    0.580518    2.948   0.003203    **
X14 [LEVEL 1]       18.422627   9.391444    -1.962  0.049823    *
X14 [LEVEL 2]       20.160172   9.386903    -2.148  0.031755    *
X14 [LEVEL 3]       12.78601    15.33008    0.834   0.404268    
X14 [LEVEL 4]       19.937816   9.679742    -2.06   0.03944 *
X14 [LEVEL 5]       13.83603    10.916449   -1.267  0.205015    
X14 [LEVEL 6]       23.939136   9.47908     -2.525  0.011565    *
X14 [LEVEL 7]       20.220041   11.217758  -1.803   0.071487    .
X14 [LEVEL 8]:X9    6.652888    4.17066     1.595   0.110697    
X14 [LEVEL 1]:X9    7.560706    1.981892    3.815   0.000137    ***
X14 [LEVEL 2]:X9    8.124572    1.857204    4.375   1.22E-05    ***
X14 [LEVEL 3]:X9    0.765371    5.173577    0.148   0.882393    
X14 [LEVEL 4]:X9    8.415016    2.337441    3.6 0.000319    ***
X14 [LEVEL 5]:X9    8.760546    3.293728    2.66    0.007828    **
X14 [LEVEL 6]:X9    10.727086   1.950529    5.5 3.87E-08    ***
X14 [LEVEL 7]:X9    8.913338    3.62592 2.458   0.013974    *
X14 [LEVEL 8]:X10   -9.409351   6.665734    -1.412  0.158089    
X14 [LEVEL 1]:X10   5.600412    0.628323    8.913   < 2e-16 ***
X14 [LEVEL 2]:X10   6.308849    0.669047    9.43    < 2e-16 ***
X14 [LEVEL 3]:X10   12.890973   5.191096    -2.483  0.013029    *
X14 [LEVEL 4]:X10   6.008453    0.835861    7.188   6.88E-13    ***
X14 [LEVEL 5]:X10   -0.174229   2.401866    -0.073  0.942174    
X14 [LEVEL 6]:X10   6.335575    0.774041    8.185   2.95E-16    ***
X14 [LEVEL 7]:X10   5.391272    2.226843    2.421   0.015488    *
---                 
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1                  

Residual standard error: 0.563 on 14573 degrees of freedom                  
(31913 observations deleted due to missingness)                 
Multiple R-squared:  0.5652                 
"   Adjusted R-squared:  0.5642 "                   
F-statistic: 526.3 on 36 and 14573 DF                   
p-value: < 2.2e-16**

Above is a linear GROWTH model. I have substituted in independent variable 'labels' for privacy purposes. In the example all numeric variables have been logarithmically transformed, and the dependent growth variable has had a box cox transformation applied to it. In the case of the independents this was done to normalize input variables, and the box cox transformation was applied to the dependent to correct increasing variance in the output. While i am certainly new to R, i believe this to be a better fit than the data with no transformations. However, please, let me know if I'm off base here. NOW, my question is, how do i interpret these values? Is there a way to 'un'transform outputs, so that the coefficient estimates and standard errors are valuable to me? They mean little in their current state.

I think this is off-topic for stackoverflow and better suited for stats.stackexchange. A good link would be https://stats.stackexchange.com/questions/5135/interpretation-of-rs-lm-output — Linus, Feb 27 '18 at 04:49
Thank you @Linus. I hope reposting there doesn't land me in hot water... still new to these forums. — UnsoughtNine, Feb 27 '18 at 13:05

Michael Cantrall · Answer 1 · 2018-03-01T20:22:06.903

First, the way to un-transform a log is to take the inverse. There are many different types of logs, one of the most common is a natural log(it looks like this is what you used) - in this case you would take the natural exponent of your variable (x);

exp(x)

A good example is simply to take a log of a number and exp

> log(58)
[1] 4.060443
> exp(4.060443)
[1] 58

This is what your headers would look like

variables estimates estimates_inverse std_error std_error_inverse
<chr>         <dbl>             <dbl>     <dbl>             <dbl>
1 Intercept    2.84              17.1     9.54             13949   
2 log(x1)      0.378              1.46    0.00865              1.01
3 log(x2)      0.364              1.44    0.00891              1.01
4 log(x3)      0.337              1.40    0.0242               1.02
5 log(x4)     -0.194              0.824   0.0298               1.03
6 log(x5)      0.0123             1.01    0.00437              1.00

Also, if your normalizing you should check each variable prior to taking the log. Make sure it needs normalization - try different normalization methods (my favorite is z score normalization;

(variable - mean(variable))/(sd(variable))

also there is a scale that does the job well

scale()

Use these different methods and watch the R^2 and P values

Hope this helped!

Thank you. So how might i go applying an exp() transformation to the regression outputs? Can i simply do so in another program? A Spreadsheet program like excel? In addition, any thought's on how to un-transform a boxcox transformation? I had checked each variable before taking the log. A log transformation benefited most substantially with regard to normalization. — UnsoughtNine, Feb 27 '18 at 13:07
Sure for the boxcox can you send me what code you used and if any packages you had installed? You can transform them all in R . — Michael Cantrall, Feb 27 '18 at 16:23
@t.searls - did you want to transform each variable or just a predicted output value? — Michael Cantrall, Mar 01 '18 at 16:49
log transforming the data normalized the inputs. The box cox transformation of the growth dependent resolved heteroscedasticity in the residuals. — UnsoughtNine, Mar 06 '18 at 19:52

How to interpret transformed independent and dependent variables in summary(lm)?

1 Answers1