7

I want to test if the slope in a simple linear regression is equal to a given constant other than zero.

> x <- c(1,2,3,4)
> y <- c(2,5,8,13)
> fit <- lm(y ~ x)
> summary(fit)

Call:
lm(formula = y ~ x)

Residuals:
   1    2    3    4 
 0.4 -0.2 -0.8  0.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -2.0000     0.9487  -2.108  0.16955   
x             3.6000     0.3464  10.392  0.00913 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7746 on 2 degrees of freedom
Multiple R-squared:  0.9818,    Adjusted R-squared:  0.9727 
F-statistic:   108 on 1 and 2 DF,  p-value: 0.009133
> confint(fit)
                2.5 %   97.5 %
(Intercept) -6.081855 2.081855
x            2.109517 5.090483

In this example, I want to test if the slope is equal to 5. I know I won't reject it since 5 is in the 95% CI. But is there a function which can give me the p-value directly?

JACKY88
  • 3,391
  • 5
  • 32
  • 48

4 Answers4

9

You just have to construct the t-statistic for the null hypothesis slope=5:

# Compute Summary with statistics      
sfit<- summary(fit)
# Compute t-student H0: intercept=5. The estimation of coefficients and their s.d. are in sfit$coefficients
tstats <- (5-sfit$coefficients[2,1])/sfit$coefficients[2,2]
# Calculates two tailed probability
pval<- 2 * pt(abs(tstats), df = df.residual(fit), lower.tail = FALSE)
print(pval)
Acoustesh
  • 160
  • 3
  • 1
    This is a Wald-test. It's a good test strategy but not quite as statistically valid as a LRT-test which is what using an offset will deliver when used with `summary.lm`. – IRTFM Oct 14 '15 at 00:39
  • 2
    @BondedDust, I think if you try it you'll see that this in fact identical to what `summary.lm` produces. For linear models, LRT and Z-tests are identical; t-tests are better because they take uncertainty in the scale parameter (variance) into account – Ben Bolker Oct 14 '15 at 02:22
  • 1
    Ah, so true. I stand corrected. The distinction only becomes relevant in the glm framework where the linear approximation is the "score statistic" – IRTFM Oct 14 '15 at 03:52
8

One approach to testing whether a fit is significantly different than a particular coefficient is to construct an "offset" in which that coefficient is used as a factor applied to the x-value. You should think of this as re-setting the "zero" at least the zero-for-the-slope. The Intercept is still "free" to "move around", er, to be estimated.

 fit2 <- lm( y~x +offset(5*x) )
#----------------
 summary(fit2)
#--------
Call:
lm(formula = y ~ x + offset(5 * x))

Residuals:
   1    2    3    4 
 0.4 -0.2 -0.8  0.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -2.0000     0.9487  -2.108   0.1695  
x            -1.4000     0.3464  -4.041   0.0561 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7746 on 2 degrees of freedom
Multiple R-squared:  0.9818,    Adjusted R-squared:  0.9727 
F-statistic:   108 on 1 and 2 DF,  p-value: 0.009133

Now compare to the results from your fit-object. The coefficients for x differ by exactly 5. The model fit statistics are the same, but as you suspected the p-value for the x-variable is much lower ... er, higher, i.e. less significant.

> summary(fit)

Call:
lm(formula = y ~ x)

Residuals:
   1    2    3    4 
 0.4 -0.2 -0.8  0.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  -2.0000     0.9487  -2.108  0.16955   
x             3.6000     0.3464  10.392  0.00913 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7746 on 2 degrees of freedom
Multiple R-squared:  0.9818,    Adjusted R-squared:  0.9727 
F-statistic:   108 on 1 and 2 DF,  p-value: 0.009133
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • But you are substracting something not independent. Wouldn't it be a problem with the assumptions for least squares or with collinearity? How is it different from lm(I(y - 5*x) ~ x)? – skan Mar 10 '17 at 16:51
  • I don't understand you comment about "independance". I do think the offset version should be equivalent to your reconstitution of the model. They both assume that there is a "shift" of the target away from the mean of `y`. – IRTFM Mar 11 '17 at 03:38
  • I mean the original regression model assumes normal distribution of "y" and homodedasticity , but if you substract two quantities maybe you can't assume that anymore. On the other hand, here: https://stats.stackexchange.com/questions/111559/test-model-coefficient-regression-slope-against-some-value Matifou says is better to use the Wald test. – skan May 19 '17 at 15:50
  • is it the offset method still right when used with logistic models with random effects? glmer – skan May 19 '17 at 15:52
2

My impression is that the linearHypothesis function from the car package provides a standard way to do this.

For example

library(car)

x <- 1:4
y <- c(2, 5, 8, 13)
model <- lm(y ~ x)

linearHypothesis(model, "x = 1")
#> Linear hypothesis test
#> 
#> Hypothesis:
#> x = 1
#> 
#> Model 1: restricted model
#> Model 2: y ~ x
#> 
#>   Res.Df  RSS Df Sum of Sq      F  Pr(>F)  
#> 1      3 35.0                              
#> 2      2  1.2  1      33.8 56.333 0.01729 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And here the hypothesis tests shows that the restricted model (i.e. the case when the coefficient of x equals 1) is explaining less variance than the full model in a statistically significant way as evaluated by an F statistic.

This is more useful than using offset in a formula as you can test multiple restrictions at once:

model <- lm(y ~ x + I(x^2))
linearHypothesis(model, c("I(x^2) = 1", "x = 1"))
#> Linear hypothesis test
#> 
#> Hypothesis:
#> I(x^2) = 1
#> x = 1
#> 
#> Model 1: restricted model
#> Model 2: y ~ x + I(x^2)
#> 
#>   Res.Df  RSS Df Sum of Sq    F  Pr(>F)  
#> 1      3 30.0                            
#> 2      1  0.2  2      29.8 74.5 0.08165 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
alexpghayes
  • 673
  • 5
  • 17
0

I like the solution with the emmeans package (because I already always have emmeans loaded because it's so useful...)

> library(emmeans)
> fit.emt <- emtrends(fit, ~1, var="x") 
> fit.emt
 1       x.trend    SE df lower.CL upper.CL
 overall     3.6 0.346  2     2.11     5.09

Confidence level used: 0.95 
> test(fit.emt, null=5)
 1       x.trend    SE df null t.ratio p.value
 overall     3.6 0.346  2    5  -4.041  0.0561
emudrak
  • 789
  • 8
  • 25