How to do a one tail pvalue calculate in Python?

Question

I have two arrays, one is an array of corrected values, x, and the other is an array of the original values(before a correction was applied), y. I know that if I want to do a two-tailed ttest to get the two-tailed pvalue I need to do this:

t_statistic, pvlaue = scipy.stats.ttest_ind(x, y, nan_policy='omit')

However this only tells me if the two arrays are significantly different from eachother. I want to show that the corrected values, x, are significantly less than y. To do this it seems like I need to get the one-tailed pvalue but I can't seem to find a function that does this. Any ideas?

https://stackoverflow.com/questions/43390222/one-sided-t-test-for-linear-regression/43390714#43390714 The same logic here. Just pass the returning t_statistic to the survival function or cdf based on the direction of the hypothesis. — ayhan, Jul 11 '17 at 23:10
I tried passing the t_statistic in the sf this but it says that I need to arguments. This is really all of the information that I have. — HM14, Jul 12 '17 at 16:35

ayhan · Accepted Answer · 2020-04-30T21:05:12.607

7

Consider these two arrays:

import scipy.stats as ss
import numpy as np
prng = np.random.RandomState(0)
x, y = prng.normal([1, 2], 1, size=(10, 2)).T

An independent sample t-test returns:

t_stat, p_val = ss.ttest_ind(x, y, nan_policy='omit')
print('t stat: {:.4f}, p value: {:4f}'.format(t_stat, p_val))

# t stat: -1.1052, p value: 0.283617

This p-value is actually calculated from the cumulative density function:

ss.t.cdf(-abs(t_stat), len(x) + len(y) - 2) * 2
# 0.28361693716176473

Here, len(x) + len(y) - 2 is the number of degrees of freedom.

Notice the multiplication with 2. If the test is one-tailed, you don't multiply. That's all. So your p-value for a left tailed test is

ss.t.cdf(t_stat, len(x) + len(y) - 2)
# 0.14180846858088236

If the test was right tailed, you would use the survival function

ss.t.sf(t_stat, len(x) + len(y) - 2)
# 0.85819153141911764

which is the same as 1 - ss.t.cdf(...).

I assumed that the arrays have the same length. If not, you need to modify the degrees of freedom.

edited Apr 30 '20 at 21:05

answered Jul 12 '17 at 17:02

ayhan

70,170
20
182
203

Quick question. Does the order I put x, y in the test_ind matter? for example I want to see if x is significantly larger than y. Do I order it x,y in the test_ind and then do a right tail test as discussed with the sf? Then if the value is <0.05 it is significantly larger than y...is this correct? – HM14 Jul 16 '17 at 02:12
@ayhan Is this correct? ' ss.t.cdf(t_stat, len(x) + len(y) - 2) * 2 ' , how come we multiply the resulting p-value by 2? Doesn't this allow for a p-value larger than 1 ? – hirschme Apr 30 '20 at 20:44
1

@hirschme For a two-tailed hypothesis test we calculate the area between -infinity and -absolute(t-value) and also the area between absolute(t-value) and inf. Since t-distribution is symmetrical you can calculate one of them and multiply by two. Of course each of these will return a value less than or equal to 0.5. However, I failed to mention in the post that if t is positive that calculation would be `(1 - ss.t.cdf(t_stat, len(x) + len(y) - 2)) * 2`. – ayhan Apr 30 '20 at 21:02
@ayhan that makes sense, thank you. I just posted the same as a general question. Feel free to answer it. Else I will just delete it as it is no longer relevant – hirschme Apr 30 '20 at 21:07

How to do a one tail pvalue calculate in Python?

1 Answers1