1

To find if there is any difference of mean between two group, we use t test as below. I use mtcars dataset.

df<-mtcars %>% dplyr::select(hp, vs)
t.test(hp~vs, data=df)
    Welch Two Sample t-test

data:  hp by vs
t = 6.2908, df = 23.561, p-value = 1.82e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  66.06161 130.66854
sample estimates:
mean in group 0 mean in group 1 
      189.72222        91.35714 

My question is What if I use logistic regression? The p values are different. To achieve the same goal, is there any difference if I use logistic regression vs t.test? Can someone clarify me on this?

summary(glm(vs~hp, data=df, family='binomial'))
Call:
glm(formula = vs ~ hp, family = "binomial", data = df)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.12148  -0.20302  -0.01598   0.51173   1.20083  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  8.37802    3.21593   2.605  0.00918 **
hp          -0.06856    0.02740  -2.502  0.01234 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 43.860  on 31  degrees of freedom
Residual deviance: 16.838  on 30  degrees of freedom
AIC: 20.838

Number of Fisher Scoring iterations: 7
zesla
  • 11,155
  • 16
  • 82
  • 147
  • 1
    Possible duplicate of https://stats.stackexchange.com/questions/159110/logistic-regression-or-t-test – LAP Nov 30 '17 at 14:18
  • Question: How do I flag a duplicate from another Stack? – LAP Nov 30 '17 at 14:19
  • P values are different because they correspond to different statistical tests. T-test is comparing means of two groups and the regression (logistic or linear) compares a coefficient with zero. However, you should select the one that fits better the nature of your study, keeping in mind they way you want to tell your story. However, just to point out that generally what you trying to find is "a statistically significant correlation between `hp` and `vs`" and you can do this in many ways..... – AntoniosK Nov 30 '17 at 14:54
  • I'm adding a linear regression model to what you've tried: `t.test(hp~factor(vs), data = mtcars); summary(lm(hp~factor(vs), data = mtcars)); summary(glm(factor(vs)~hp, data = mtcars, family='binomial'))`. Look at the model outputs, think how you want to present your analysis/findings and select the most appropriate one... – AntoniosK Nov 30 '17 at 14:56
  • To go a bit further, you can also use non-parametric models like decision trees to investigate a correlation. Try this: `library(party); m1 = ctree(hp~factor(vs), data = mtcars); plot(m1, type="simple"); m2 = ctree(factor(vs)~hp, data = mtcars); plot(m2, type="simple")`. Everything we've tried so far shows that when `hp` gets higher values we tend to have `vs` = 0 and when `hp` gets lower values we tend to have `vs` = 1. – AntoniosK Nov 30 '17 at 15:01
  • @LAP you can click the option "flag" under the question and the tags. Then follow the instructions :-) – AntoniosK Nov 30 '17 at 15:03
  • @AntoniosK I know that :) It does not let me choose a duplicate from another stack though (in this case stats.stackexchange.com instead of stackoverflow.com). – LAP Dec 01 '17 at 08:07

0 Answers0