0

I've isolated a problem in my script that is occurring due to this attempt at a standard deviation calculation using scipy's .tstd function,

 sp.stats.tstd(IR)

where my IR value is 0.0979. Is there a way to get this to stop (I assume) rounding it to zero? I've tried a suggestion from a previous stackoverflow post that suggested calling the number an np.float64 but that didn't work. Hoping someone has the answer.

Full Error printout:

    Traceback (most recent call last):
  File "Utt_test.py", line 995, in <module>
    X.write(Averaging())
  File "Utt_test.py", line 115, in Averaging
    IR_sdev=str(round(sp.stats.tstd(IR),4))
  File "/usr/lib64/python2.7/site-packages/scipy/stats/stats.py", line 848, in tstd
    return np.sqrt(tvar(a,limits,inclusive))
  File "/usr/lib64/python2.7/site-packages/scipy/stats/stats.py", line 755, in tvar
    return a.var()*(n/(n-1.))
ZeroDivisionError: float division by zero
Matt
  • 3,508
  • 6
  • 38
  • 66
  • 6
    `tstd` requires an input of a numpy array. Hard to calculate a standard deviation of a single number. – Daniel Jul 26 '13 at 17:02
  • 1
    From the error message the problem arise during the computation of ``a.var()*(n/(n-1.))``. Than is the problem is that ``n=1.``. Nothing to do with numpy float or IR... – hivert Jul 26 '13 at 17:02
  • I see...hmmmm, gonna have to rewrite some things then. The program usually runs with more than one number. I'm trying a special case. – Matt Jul 26 '13 at 17:03
  • That error message is not very helpful, but the real problem is that `tstd` and `tvar` (which is called by `tstd`) need at least two values. – Warren Weckesser Jul 26 '13 at 17:04

1 Answers1

0

The method tstd computes the square root of sample variance. The sample variance differs from the population variance by the factor n/(n-1) which is necessary to make sample variance an unbiased estimator for the population variance. This breaks down for n=1, which is understandable because having one number gives us no idea of what the population variance might be.

If this adjustment is undesirable (perhaps your array is the total population, and not a sample from it), use numpy.std instead. For an array of size 1, it will return 0 as expected. If used with parameter ddof=1, numpy.std becomes equivalent to stats.tstd.


Asie: SciPy's documentation states

tstd computes the unbiased sample standard deviation, i.e. it uses a correction factor n / (n - 1).

repeating the common misconception that this standard error estimator is unbiased (in fact, the correction factor eliminates bias for the variance, not for standard deviation). NumPy's std documentation turns out to be correct on this point where it discusses ddof parameter

If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.