54

In R, it is possible to perform two-sample one-tailed t-test simply by using

> A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846)
> B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880)
> t.test(A, B, alternative="greater")

    Welch Two Sample t-test

data:  A and B 
t = -0.4189, df = 6.409, p-value = 0.6555
alternative hypothesis: true difference in means is greater than 0 
95 percent confidence interval:
 -1.029916       Inf 
sample estimates:
mean of x mean of y 
0.9954942 1.1798523 

In Python world, scipy provides similar function ttest_ind, but which can only do two-tailed t-tests. Closest information on the topic I found is this link, but it seems to be rather a discussion of the policy of implementing one-tailed vs two-tailed in scipy.

Therefore, my question is that does anyone know any examples or instructions on how to perform one-tailed version of the test using numpy/scipy?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Timo
  • 5,188
  • 6
  • 35
  • 38
  • 3
    As of scipy version `1.6.0` performing a one-sided ttest is now a parameter in [`scipy.stats.ttest_ind`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html#scipy-stats-ttest-ind). you can now mess with the `alternative` parameter. – MattR Nov 24 '21 at 19:12

6 Answers6

93

From your mailing list link:

because the one-sided tests can be backed out from the two-sided tests. (With symmetric distributions one-sided p-value is just half of the two-sided pvalue)

It goes on to say that scipy always gives the test statistic as signed. This means that given p and t values from a two-tailed test, you would reject the null hypothesis of a greater-than test when p/2 < alpha and t > 0, and of a less-than test when p/2 < alpha and t < 0.

lvc
  • 34,233
  • 10
  • 73
  • 98
  • 1
    I am a bit confused by this formulation of `t`. H0: first is greater than second `first = np.random.normal(3,2,400); second = np.random.normal(6,2,400); t, p = stats.ttest_ind(first, second, axis=0, equal_var=True) t-stat = -23.0, p-value/2 = 1.33e-90 ` So, I have a null hypothesis of greater-than test but t<0, meaning I cannot reject the null-hypothesis? – Alina Aug 06 '17 at 11:21
  • 5
    @Tonja: you are getting a negative t-stat because the difference between the first mean and the second mean is negative. The difference of means that `scipy.stats.ttest_ind(a, b)` computes is `mean(a)-mean(b)`, so if your Alternative Hypothesis that you are trying to prove is that `mean(second)>mean(first)`, then you can call `scipy.stats.ttest_ind(second, first)` and you don't have to worry about signs. In this case reject the Null Hypothesis (i.e. `mean(second)<=mean(first)`) if `p-value/2 < alpha`, which is equivalent to `t>t_crit(df)`, – bpirvu Apr 14 '18 at 14:20
  • ... `t_crit(df)` is the critical t-value for `df` degrees of freedom which is basically `sample_size_1 + sample_size_2 -2` and can be read off a statistical table like this one http://users.stat.ufl.edu/~athienit/Tables/tables. – bpirvu Apr 14 '18 at 14:26
  • ... for a one-tailed test (or this one http://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf for a two-tailed test). – bpirvu Apr 14 '18 at 14:33
32

After trying to add some insights as comments to the accepted answer but not being able to properly write them down due to general restrictions upon comments, I decided to put my two cents in as a full answer.

First let's formulate our investigative question properly. The data we are investigating is

A = np.array([0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846])
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])

with the sample means

A.mean() = 0.99549419
B.mean() = 1.1798523

I assume that since the mean of B is obviously greater than the mean of A, you would like to check if this result is statistically significant.

So we have the Null Hypothesis

H0: A >= B

that we would like to reject in favor of the Alternative Hypothesis

H1: B > A

Now when you call scipy.stats.ttest_ind(x, y), this makes a Hypothesis Test on the value of x.mean()-y.mean(), which means that in order to get positive values throughout the calculation (which simplifies all considerations) we have to call

stats.ttest_ind(B,A)

instead of stats.ttest_ind(B,A). We get as an answer

  • t-value = 0.42210654140239207
  • p-value = 0.68406235191764142

and since according to the documentation this is the output for a two-tailed t-test we must divide the p by 2 for our one-tailed test. So depending on the Significance Level alpha you have chosen you need

p/2 < alpha

in order to reject the Null Hypothesis H0. For alpha=0.05 this is clearly not the case so you cannot reject H0.

An alternative way to decide if you reject H0 without having to do any algebra on t or p is by looking at the t-value and comparing it with the critical t-value t_crit at the desired level of confidence (e.g. 95%) for the number of degrees of freedom df that applies to your problem. Since we have

df = sample_size_1 + sample_size_2 - 2 = 8

we get from a statistical table like this one that

t_crit(df=8, confidence_level=95%) = 1.860

We clearly have

t < t_crit

so we obtain again the same result, namely that we cannot reject H0.

bpirvu
  • 791
  • 7
  • 8
  • Let's say I want only the p-values for each one tailed test: stats.ttest_ind(B,A) and stats.ttest_ind(A,B). How should I think? Take 1-p/2 if the t-stat is < 0? – CHRD Jul 15 '20 at 15:41
6
    from scipy.stats import ttest_ind  
    
    def t_test(x,y,alternative='both-sided'):
            _, double_p = ttest_ind(x,y,equal_var = False)
            if alternative == 'both-sided':
                pval = double_p
            elif alternative == 'greater':
                if np.mean(x) > np.mean(y):
                    pval = double_p/2.
                else:
                    pval = 1.0 - double_p/2.
            elif alternative == 'less':
                if np.mean(x) < np.mean(y):
                    pval = double_p/2.
                else:
                    pval = 1.0 - double_p/2.
            return pval

    A = [0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846]
    B = [0.6383447, 0.5271385, 1.7721380, 1.7817880]

    print(t_test(A,B,alternative='greater'))
    0.6555098817758839
Dolittle Wang
  • 686
  • 7
  • 7
4

When null hypothesis is Ho: P1>=P2 and alternative hypothesis is Ha: P1<P2. In order to test it in Python, you write ttest_ind(P2,P1). (Notice the position is P2 first).

first = np.random.normal(3,2,400)
second = np.random.normal(6,2,400)
stats.ttest_ind(first, second, axis=0, equal_var=True)

You will get the result like below Ttest_indResult(statistic=-20.442436213923845,pvalue=5.0999336686332285e-75)

In Python, when statstic <0 your real p-value is actually real_pvalue = 1-output_pvalue/2= 1-5.0999336686332285e-75/2, which is approximately 0.99. As your p-value is larger than 0.05, you cannot reject the null hypothesis that 6>=3. when statstic >0, the real z score is actually equal to -statstic, the real p-value is equal to pvalue/2.

Ivc's answer should be when (1-p/2) < alpha and t < 0, you can reject the less than hypothesis.

evapanda
  • 41
  • 3
  • It's quite surprised me when I was reading your post. From my opinion, I think that in 1 sided p-value, p should always is out_p/2. Can you give me some related docs about (1-p/2) – Chau Pham Mar 22 '19 at 08:00
  • According to the [documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html), only if the parameter `equal_var` is set to **False**, can `stats.ttest_ind()` perform the one-tailed hypothesis test. Hence, your example shows a two-tailed hypothesis test, isn't it? – Yongfeng Feb 17 '20 at 14:38
2

Based on this function from R: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/t.test

def ttest(a, b, axis=0, equal_var=True, nan_policy='propagate',
          alternative='two.sided'):        
    tval, pval = ttest_ind(a=a, b=b, axis=axis, equal_var=equal_var,
                           nan_policy=nan_policy)
    if alternative == 'greater':
        if tval < 0:
            pval = 1 - pval / 2
        else:
            pval = pval / 2
    elif alternative == 'less':
        if tval < 0:
            pval /= 2
        else:
            pval = 1 - pval / 2
    else:
        assert alternative == 'two.sided'
    return tval, pval
ady
  • 1,108
  • 13
  • 19
-2

Did you look at this: How to calculate the statistics "t-test" with numpy

I think that is exactly what this questions is looking at.

Basically:

import scipy.stats
x = [1,2,3,4]
scipy.stats.ttest_1samp(x, 0)

Ttest_1sampResult(statistic=3.872983346207417, pvalue=0.030466291662170977)

is the same result as this example in R. https://stats.stackexchange.com/questions/51242/statistical-difference-from-zero

Jorge
  • 2,181
  • 1
  • 19
  • 30