Error on weighted histogram in python

Question

I want to calculate the error on a bin height by taking the square root of the sum of the weights squared (sumw2) in that bin (poission error). Is there any way to get the sum of weights (sumw) and/or sumw2 when histogramming data with either matplotlib or numpy (or any other library for that matter)?

Let's say I have some data in a numpy array x and some weights w in another numpy array, to get the histogram I would either so

n, bins, patches = pyplot.hist(x,weights=w)

or

n, bins = numpy.histogram(x,weights=w)

In both cases I have no clue which entries of w belong to which bin right?

Edit: Currently I'm using YODA to do this. The disadvantage from my point of view is that YODA histograms can only be filled with one data point at a time.

score 3 · Answer 1 · answered Oct 08 '20 at 12:23

Consider an array x with weights w. The histogram of the data in x weighted by w with bins is given by:

n, bins = np.histogram(x, bins=bins, weights=w)

And the associated errors to n can be computed as:

n_err = np.sqrt(np.histogram(x, bins=bins, weights=w**2)[0])

Note that if the data is not weighted (i.e. (w == 1).all()) then the error reduces to the "standard" np.sqrt(n)

Tural Gurbanov · Accepted Answer · 2018-01-12T17:37:30.620

1

According to numpy documentation, weights

An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1). If density is True, the weights are normalized, so that the integral of the density over the range remains 1.

That means that each value in w should be associated with a value in x. If you'd like to weight bins and plot them, you can first find bins' values, multiply them by weights and finally plot them using bar.

val, pos = np.histogram(np.arange(1000))
w_val = val * w
plt.bar(pos[1:], w_val)

Update from the comment:

Ahh, sorry, it seems that I didn't understand the initial question. Actually, you can use pos to find cells related to each bin and calculate your weight function using these information.

for left, right in zip(pos, pos[1:): 
    ix = np.where((x >= left) & (x <= right))[0] 
    sumw2 = np.sum(w[ix] ** 2)

edited Jan 12 '18 at 17:37

answered Jan 12 '18 at 13:24

Tural Gurbanov

742
2
7
27

This does not work if the number of bins is not the number of entries in a, since `len(val)==nbins` and `len(w)==len(a)` or am I missing something? – Lxndr Jan 12 '18 at 13:32
@Lxndr could you please explain what is `a`? – Tural Gurbanov Jan 12 '18 at 13:39
Oh sorry `a` (from the documentation) is `x` from my example above if I understand the documentation correctly. – Lxndr Jan 12 '18 at 13:42
Ok, if `len(x) == len(w)` then in `numpy.histogram(x,weights=w)` `x[i]` is associated with `w[i]` – Tural Gurbanov Jan 12 '18 at 13:52
But then there is still no way to get sumw2 for each bin. By handing the weights to `numpy.histogram()` one would just change the height of the bins, i.e. the `val` array. The error of the bins stays inaccessible.. – Lxndr Jan 12 '18 at 14:24
Ahh, sorry, it seems that I didn't understand the initial question. Actually, you can yes `pos` to find cell related to each bean and calculate your weights. for left, right in zip(pos, pos[1:): ix = np.where((x >= left) & (x <= right))[0] sumw2 = np.sum(w[ix] ** 2) – Tural Gurbanov Jan 12 '18 at 14:48
This is it! Thanks a lot. Can we somehow make this the accepted answer? – Lxndr Jan 12 '18 at 15:49

Error on weighted histogram in python

2 Answers2