2

I would like to use a permutation-based alternative to scipy.stats.ttest_1samp to test if the mean of my observations is significantly greater than zero. I stumbled upon scipy.stats.permutation_test but I am not sure if this can also be used in my case? I also stumbled upon mne.stats.permutation_t_test which seems to do what I want, but I would like to stick to scipy if I can.

Example:

import numpy as np
from scipy import stats

# create data
np.random.seed(42)
rvs = np.random.normal(loc=5,scale=5,size=100)

# compute one-sample t-test 
t,p = stats.ttest_1samp(rvs,popmean=0,alternative='greater')
Johannes Wiesner
  • 1,006
  • 12
  • 33

2 Answers2

2

This test can be performed with permutation_test. With permutation_type='samples', it "permutes" the signs of the observations. Assuming data has been generated as above, the test can be performed as

from scipy import stats
def t_statistic(x, axis=-1):
    return stats.ttest_1samp(x, popmean=0, axis=axis).statistic

res = stats.permutation_test((rvs,), t_statistic, permutation_type='samples')
print(res.pvalue)

If you only care about the p-value, you can get the same result with np.mean as the statistic instead of t_statistic.

Admittedly, this behavior for permutation_type='samples' with only one sample is a bit buried in the documentation.

Accordingly, if data contains only one sample, then the null distribution is formed by independently changing the sign of each observation.

But a test producing the same p-value could also be performed as a two-sample test in which the second sample is the negative of the data. To avoid special cases, that's actually what permutation_test does under the hood.

In this case, the example code above is a lot faster than permutation_test right now. I'll try to improve that for SciPy 1.10, though.

1

Based on the current docs, it does not appear that the equivalent of a one-sample t-test is achievable with the permutation_test function. But it's possible to implement it using numpy, as shown below. This is based on the R implementation (found here) and this thread on Cross Validated, with options to do a one-sided test and a test against a specific mean added.

import numpy as np

def permutation_ttest_1samp(
    data, popmean, n_resamples, alternative='two-sided', random_state=None
):

    assert alternative in ('two-sided', 'less', 'greater'), (
        "Unrecognized alternative hypothesis"
    )

    n = len(data)

    data = np.asarray(data) - popmean
    dbar = np.mean(data)
    
    absx = np.abs(data)
    z = []

    rng = np.random.RandomState(random_state)

    for _ in range(n_resamples):
        mn = rng.choice((-1,1), n, replace=True)
        xbardash = np.mean(mn * absx)
        z.append(xbardash)
    z = np.array(z)

    if alternative == 'greater':
        return 1 - (np.sum(z <= -np.abs(dbar)) / n_resamples)
    elif alternative == 'less':
        return np.sum(z <= -np.abs(dbar)) / n_resamples
    return (
        (np.sum(z >= np.abs(dbar)) + np.sum(z <= -np.abs(dbar))) / n_resamples
    )

Example 1 (two-sided test against null hypothesis of mean 0):

rng = np.random.RandomState(42)
rvs = rng.normal(loc=0, scale=0.01, size=1000)

pval = permutation_ttest_1samp(rvs, 0, 100_000, alternative='two-sided', random_state=42)
print(pval)
# 0.53206

Comparing to parameterized t-test:

from scipy.stats import ttest_1samp

stat, pval = ttest_1samp(rvs, popmean=0, alternative='two-sided')
print(pval)
# 0.5325672436623021

Example 2 (one-sided test against a non-0 mean null hypothesis)

rng = np.random.RandomState(42)
rvs = rng.normal(loc=0, scale=3, size=1000)

pval = permutation_ttest_1samp(rvs, 0.1, 100_000, alternative='greater', random_state=42)
print(pval)
# 0.6731

Comparing to parameterized t-test:

from scipy.stats import ttest_1samp

stat, pval = ttest_1samp(rvs, popmean=0.1, alternative='greater')
print(pval)
# 0.6743729530216749
AlexK
  • 2,855
  • 9
  • 16
  • 27
  • Ha! Interesting, that this does not have been implemented in scipy. Do you think it would make sense to open an issue for that on Github? Thanks for the code! In case you have time: I would love to also obtain a t-value, maybe you could also implement that in your function? – Johannes Wiesner Sep 02 '22 at 12:10
  • 1
    I have not found a reference for how to compute the test statistic with this test, so I'm inclined to leave the test statistic computation to others. I just corrected how the p-value for the 'greater' test is computed, to match what's done in a standard t-test. – AlexK Sep 03 '22 at 07:40
  • I opened an [issue](https://github.com/scipy/scipy/issues/16962) on Github – Johannes Wiesner Sep 05 '22 at 09:46