Why does SciPy return `nan` for a t-test with samples with 0 variance?

Question

I am using SciPy in Python and the following return a nan value for whatever reason:

>>>stats.ttest_ind([1, 1], [1, 1])
Ttest_indResult(statistic=nan, pvalue=nan)

>>>stats.ttest_ind([1, 1], [1, 1, 1])
Ttest_indResult(statistic=nan, pvalue=nan).

But whenever I use samples that have different summary statistics, I actually get a reasonable value:

stats.ttest_ind([1, 1], [1, 1, 1, 2])
Ttest_indResult(statistic=-0.66666666666666663, pvalue=0.54146973927558495).

Is it reasonable to interpret a p-value of nan as 0 instead? Is there a reason from statistics that it doesn't make sense to run a 2-sample t-test on samples with the same summary statistics?

I think that the problem is that ttests include a division by the standard deviation. I would instead check if the standard deviation is 0 because there may be other cases where it returns nan (not sure what they would be though) — Oscar Smith, Jul 13 '16 at 15:49

score 4 · Accepted Answer · edited Jul 23 '17 at 19:39

4

Division by zero will raise the NaN (= not a number) exception, or return a floating-point representation that, by convention, matches NaN. Be particularly careful of divide-by-N versus divide-by-N-minus-one standard deviation formulae.

edited Jul 23 '17 at 19:39

rayryeng

102,964
22
184
193

answered Jul 13 '16 at 16:59

Bruce David Wilner

463
2
4

2

Shouldn't that comment be "Be [...] careful of divide-by-N versus divide-by-N-**minus**-one [...]"? – Warren Weckesser Jul 13 '16 at 17:40

Why does SciPy return `nan` for a t-test with samples with 0 variance?

1 Answers1