4

I am using SciPy in Python and the following return a nan value for whatever reason:

>>>stats.ttest_ind([1, 1], [1, 1])
Ttest_indResult(statistic=nan, pvalue=nan)

>>>stats.ttest_ind([1, 1], [1, 1, 1])
Ttest_indResult(statistic=nan, pvalue=nan).

But whenever I use samples that have different summary statistics, I actually get a reasonable value:

stats.ttest_ind([1, 1], [1, 1, 1, 2])
Ttest_indResult(statistic=-0.66666666666666663, pvalue=0.54146973927558495).

Is it reasonable to interpret a p-value of nan as 0 instead? Is there a reason from statistics that it doesn't make sense to run a 2-sample t-test on samples with the same summary statistics?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
under_the_sea_salad
  • 1,754
  • 3
  • 22
  • 42
  • I think that the problem is that ttests include a division by the standard deviation. I would instead check if the standard deviation is 0 because there may be other cases where it returns nan (not sure what they would be though) – Oscar Smith Jul 13 '16 at 15:49

1 Answers1

4

Division by zero will raise the NaN (= not a number) exception, or return a floating-point representation that, by convention, matches NaN. Be particularly careful of divide-by-N versus divide-by-N-minus-one standard deviation formulae.

rayryeng
  • 102,964
  • 22
  • 184
  • 193