0

I have created a histogram using matplotlib of my experimental data, which consists of the value measured and the weight. Using the weights argument of plt.hist it is no problem weighting together the events, but when I look at options for errorbars none seem to take event weights into account. There are solutions to this problem where Poisson errors or the same error is used everywhere, like this one, but that does not solve my problem.

The error of one bin should mathematically be calculated as err(bin) = sqrt( sum {w_i^2} ) where w_i are the individual weights of the events that belong in that bin.

A simplified example of my histogram is given below.

import matplotlib.pyplot as plt

data=[1,8,5,4,1,10,8,3,6,7]
weights=[1.3,0.2,0.01,0.9,0.4,1.05,0.6,0.6,0.8,1.8]

plt.hist(data, bins = [0.0,2.5,5.0,7.5,10.0], weights=weights) 
plt.show()
  • You want to plot as error the weighted stddev of each bin? – Pietro Mar 29 '21 at 14:29
  • No, the weights are not stddev but importance, where a higher weight means more important to the final result. – QuarkMatter Mar 29 '21 at 15:19
  • Yes I mean use the weights to compute the stddev? So something like `sum [(bin_center-x)**2 * weight for x in data_in_bin] / total_bin_weights` ? (Formatting math in the comments is ugly but you get the gist I hope) – Pietro Mar 29 '21 at 15:22
  • Ok that edit simplifies things (the weighted stddev is quite tricky). – Pietro Mar 29 '21 at 15:28
  • I suppose you are correct and I could do that. I have added the math for how to weight together the importances without going through stddev to the question as well. – QuarkMatter Mar 29 '21 at 15:30

1 Answers1

0

You have to manually compute the errors for each bin and plot that separately.

import matplotlib.pyplot as plt  # type: ignore
import numpy as np  # type: ignore

data = np.array([1, 8, 5, 4, 1, 10, 8, 3, 6, 7])
weights = np.array([1.3, 0.2, 0.01, 0.9, 0.4, 1.05, 0.6, 0.6, 0.8, 1.8])

bin_edges = [0.0, 2.5, 5.0, 7.5, 10.0]

bin_y, _, bars = plt.hist(data, bins=bin_edges, weights=weights)
print(f"bin_y {bin_y}")
print(f"bin_edges {bin_edges}")

errors = []
bin_centers = []

for bin_index in range(len(bin_edges) - 1):

    # find which data points are inside this bin
    bin_left = bin_edges[bin_index]
    bin_right = bin_edges[bin_index + 1]
    in_bin = np.logical_and(bin_left < data, data <= bin_right)
    print(f"in_bin {in_bin}")

    # filter the weights to only those inside the bin
    weights_in_bin = weights[in_bin]
    print(f"weights_in_bin {weights_in_bin}")

    # compute the error however you want
    error = np.sqrt(np.sum(weights_in_bin ** 2))
    errors.append(error)
    print(f"error {error}")

    # save the center of the bins to plot the errorbar in the right place
    bin_center = (bin_right + bin_left) / 2
    bin_centers.append(bin_center)
    print(f"bin_center {bin_center}")

# plot the error bars
plt.errorbar(bin_centers, bin_y, yerr=errors, linestyle="none")

plt.show()

Which produces this

plot_with_errors_sqrt

By the time you added the edits I had done the plot with the stddev for each bin, just change errors to stddevs computed as

data_in_bin = data[in_bin]
variance = np.average((data_in_bin - bin_center) ** 2, weights=weights_in_bin)
stddev = np.sqrt(variance)
print(f"stddev {stddev}")
stddevs.append(stddev)

But you should check that the stddev computation makes sense for your use case. This results in :

plot_stddev_bin

Cheers!

Pietro
  • 1,090
  • 2
  • 9
  • 15