4

My problem:

I have an array of ufloats (e.g. an unarray) in pythons uncertainties package. All values of the array got their own errors, and I need a funktion, that gives me the average of the array in respect to both, the error I get when calculating the mean of the nominal values and the influence the values errors have.

I have an uarray:

2 +/- 1 3 +/- 2 4 +/- 3

and need a funktion, that gives me an average value of the array.

Thanks

DomR
  • 41
  • 1
  • 2

4 Answers4

3

Assuming Gaussian statistics, the uncertainties stem from Gaussian parent distributions. In such a case, it is standard to weight the measurements (nominal values) by the inverse variance. This application to the general weighted average gives,

$$ \frac{\sum_i w_i x_i}{\sum_i w_i} = \frac{\sum_i x_i/\sigma_i^2}{\sum_i 1/\sigma_i^2} $$.

One need only perform good 'ol error propagation on this to get an uncertainty of the weighted average as,

$$ \sqrt{\sum_i \frac{1}{1/\sum_i \sigma_i^2}} $$

I don't have an n-length formula to do this syntactically speaking on hand, but here's how one could get the weighted average and its uncertainty in a simple case:

    a = un.ufloat(5, 2)
    b = un.ufloat(8, 4)
    wavg = un.ufloat((a.n/a.s**2 + b.n/b.s**2)/(1/a.s**2 + 1/b.s**2), 
                     np.sqrt(2/(1/a.s**2 + 1/b.s**2)))
    print(wavg)
    >>> 5.6+/-2.5298221281347035

As one would expect, the result tends more-so towards the value with the smaller uncertainty. This is good since a smaller uncertainty in a measurement implies that its associated nominal value is closer to the true value in the parent distribution than those with larger uncertainties.

Nicolai Weitkemper
  • 403
  • 1
  • 9
  • 18
Captain Morgan
  • 233
  • 1
  • 9
1

Unless I'm missing something, you could calculate the sum divided by the length of the array:

from uncertainties import unumpy, ufloat
import numpy as np
arr = np.array([ufloat(2, 1), ufloat(3, 2), ufloat(4,3)])
print(sum(arr)/len(arr))
# 3.0+/-1.2

You can also define it like this:

arr1 = unumpy.uarray([2, 3, 4], [1, 2, 3])
print(sum(arr1)/len(arr1))
# 3.0+/-1.2

uncertainties takes care of the rest.

Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • I doubt thats it, if I use this on my real data, I get an error value of +/- 0.4 while the standard error of the mean of the nominal values is around 8. – DomR Apr 26 '17 at 15:13
  • You might have a different error distribution. This [article](https://newton.cx/~peter/2013/04/propagating-uncertainties-the-lazy-and-absurd-way/) might interest you. – Eric Duminil Apr 26 '17 at 15:20
  • The problem with this is you're getting the nominal value and uncertainty of the simple sum divided by the length of entries. See my answer (coming up). – Captain Morgan Sep 15 '20 at 17:53
0

I used Captain Morgan's answer to serve up some sweet Python code for a project and discovered that it needed a little extra ingredient:

    import uncertainties as un
    from un.unumpy import unp
    epsilon = unp.nominal_values(values).mean()/(1e12)
    wavg = ufloat(sum([v.n/(v.s**2+epsilon) for v in values])/sum([1/(v.s**2+epsilon) for v in values]), 
                  np.sqrt(len(values)/sum([1/(v.s**2+epsilon) for v in values])))
    if wavg.s <= np.sqrt(epsilon):
        wavg = ufloat(wavg.n, 0.0)

Without that little something (epsilon) we'd get div/0 errors from observations recorded with zero uncertainty.

Michael Tiemann
  • 251
  • 2
  • 9
0

If you already have a .csv file which stores variables in 'mean+/-sted' format, you could try the code below; it works for me.

from uncertainties import ufloat_fromstr
df=pd.read_csv('Z:\compare\SL2P_PAR.csv')
for i in range(len(df.uncertainty)):
df['mean'] = ufloat_fromstr(df['uncertainty'][I]).n
df['sted'] = ufloat_fromstr(df['uncertainty'][I]).s