stats.ttest_ind() vs. "manual" computation of Student's independent t-test: different results

Question

I am comparing stats.ttest_ind() vs "manual" computation of the same test, and get different results.

import numpy as np
import pandas as pd
import scipy.stats as stats
import math

stats.ttest_ind() method:

#generate data
np.random.seed(123)
df = pd.DataFrame({ 
    'age':np.random.normal(40,5,200).round(),
    'sex':np.random.choice( ['male', 'female'], 200, p=[0.4, 0.6]),
     })

#define groups
men = df.age[df.sex == 'male']
women = df.age[df.sex == 'female']

#run t-test
test_stat, test_p = stats.ttest_ind(men, women)
print(test_stat, test_p)

Out:

-0.9265613940505325 0.355282312357339

Manual method:

#mean
men_mean, women_mean = men.mean(), women.mean()
#standard deviation
men_sd, women_sd = men.std(ddof=1), women.std(ddof=1)
#standard error
men_n, women_n = len(men), len(women)
men_se, women_se = men_sd/math.sqrt(men_n), women_sd/math.sqrt(women_n)
#standard error on the difference between men and women
se_diff = math.sqrt(men_se**2.0 + women_se**2.0)
#t-stat
t_stat = (men_mean - women_mean) / se_diff
#degrees of freedom
df = men_n + women_n - 2
#critical value
alpha = 0.05
cv = stats.t.ppf(1.0 - alpha, df)
# p-value
p = (1 - stats.t.cdf(abs(t_stat), df)) * 2
print(t_stat, cv, p)

Out:

-0.9244538916746341 0.3563753194455255

We can see there's a small difference. Why? Maybe because of how stats.ttest_ind() computes degrees of freedom? Any insight much appreciated.

To get the same t-statistic you need to state that the two groups need not have the same variance, i.e. `ttest_ind(men, women,equal_var=False)` — LudvigH, Feb 07 '21 at 00:33
and in that case the df-calculation is trickier. see wikipedia https://en.wikipedia.org/wiki/Welch%27s_t-test — LudvigH, Feb 07 '21 at 00:44

LudvigH · Accepted Answer · 2021-02-07T16:10:14.067

The following works. It is your code from above, with only two rows changed.

import numpy as np
import pandas as pd
import scipy.stats as stats
import math
#generate data
np.random.seed(123)
df = pd.DataFrame({ 
    'age':np.random.normal(40,5,200).round(),
    'sex':np.random.choice( ['male', 'female'], 200, p=[0.4, 0.6]),
     })

#define groups
men = df.age[df.sex == 'male']
women = df.age[df.sex == 'female']

#run t-test
############################### CHANGED THE ROW BELOW HERE
test_stat, test_p = stats.ttest_ind(men, women,equal_var=False)  
print(test_stat, test_p)
#mean
men_mean, women_mean = men.mean(), women.mean()
#standard deviation
men_sd, women_sd = men.std(ddof=1), women.std(ddof=1)
#standard error
men_n, women_n = len(men), len(women)
men_se, women_se = men_sd/math.sqrt(men_n), women_sd/math.sqrt(women_n)
#standard error on the difference between men and women
se_diff = math.sqrt(men_se**2.0 + women_se**2.0)
#t-stat
t_stat = (men_mean - women_mean) / se_diff
#degrees of freedom
############################### CHANGED THE ROW BELOW HERE
df = (men_sd**2/men_n + women_sd**2/women_n)**2 / ( men_sd**4/men_n**2/(men_n-1)  + women_sd**4/women_n**2/(women_n-1)   )
#critical value
alpha = 0.05
cv = stats.t.ppf(1.0 - alpha, df)
# p-value
p = (1 - stats.t.cdf(abs(t_stat), df)) * 2
print(t_stat, cv, p)

and it outputs

-0.9244538916746341 0.356441636045986
-0.9244538916746341 1.6530443278019797 0.3564416360459859

The reason for the non-consistency in your code is this:

On the line test_stat, test_p = stats.ttest_ind(men, women) you accepted the default setting that the t-test is to be computed by the equal-variances assumption. So the computation that scipy.stats gives you is a pure equal-variance t-test. That is described in the documentation for scipy.stats.ttest_ind

In your own code, you followed the Welch test in general: you computed the estimate of the mean and its standard error for the men and women separately, and computed the t-statistic in that way.

You did deviate from the Welch test in one place: the degrees-of-freedom computation. The degrees of freedom should be approximated with the formula I entered in the code (and linked to above), but you used the computation applicable under equal-variance assumptions.

IF you want more details on how to compute these statistics, or why they are defined as they are, or why your code is not what you expected it, I suggest you check out https://stats.stackexchange.com/ and https://datascience.stackexchange.com/ that are more appropriate for statistics questions, in comparison to https://stackoverflow.com/ that is more about the programming. Both those communities are proficient in python as well, so they should be able to help you out perfectly good.

Many thanks @LudvigH, answer accepted, however can you please say more on two things that are still not clear to me: (1) Since I sampled for the normal distribution, I assumed variance was equal btw samples (although they did not have the same size indeed)? (2) I thought I computed my t-value as in a pooled test, is that not the case? — johnjohn, Feb 07 '21 at 13:13
I expanded my wording a little bit, but not so much that I let the statistics take more room than the programming. I suggest you implement the equal-variance-assumptions code yourself as well, and see how different it is. If you find that difficult, I suggest you reach out on stats.stackexchange.com with a question. You can ping me there as well. — LudvigH, Feb 07 '21 at 16:11
Many thanks LudvigH, actually I got it now (indeed the problem was about how I computed degrees of freedom)! And I will remember your suggestion to post more 'statistical' question on stats.stackexchange. Thanks again — johnjohn, Feb 07 '21 at 16:32

stats.ttest_ind() vs. "manual" computation of Student's independent t-test: different results

1 Answers1